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Abstract 


Privacy  policies  often  place  restrictions  on  the  purposes  for  which  a  governed  entity  may  use  personal  infor¬ 
mation.  For  example,  regulations,  such  as  the  Health  Insurance  Portability  and  Accountability  Act  (HIPAA), 
require  that  hospital  employees  use  medical  information  for  only  certain  purposes,  such  as  treatment,  but  not 
for  others,  such  as  gossip.  Thus,  using  formal  or  automated  methods  for  enforcing  privacy  policies  requires 
a  semantics  of  purpose  restrictions  to  determine  whether  an  action  is  for  a  purpose.  We  provide  such  a 
semantics  using  a  formalism  based  on  planning.  We  model  planning  using  a  modified  version  of  Markov 
Decision  Processes  (MDPs),  which  exclude  redundant  actions  for  a  formal  definition  of  redundant.  We  ar¬ 
gue  that  an  action  is  for  a  purpose  if  and  only  if  the  action  is  part  of  a  plan  for  optimizing  the  satisfaction  of 
that  purpose  under  the  MDP  model.  We  use  this  formalization  to  define  when  a  sequence  of  actions  is  only 
for  or  not  for  a  purpose.  This  semantics  enables  us  to  create  and  implement  an  algorithm  for  automating 
auditing,  and  to  describe  formally  and  compare  rigorously  previous  enforcement  methods.  We  extend  this 
formalization  to  Partially  Observable  Markov  Decision  Processes  (POMDPs)  to  answer  when  information 
is  used  for  a  purpose.  To  validate  our  semantics,  we  provide  an  example  application  and  conduct  a  survey 
to  compare  our  semantics  to  how  people  commonly  understand  the  word  “purpose”. 
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Chapter  1 


Introduction 


1.1  Motivation  of  Problem 

Purpose  is  a  key  concept  for  privacy  policies.  For  example,  the  European  Union  requires  that  [The95]: 

Member  States  shall  provide  that  personal  data  must  be  [. . .]  collected  for  specified,  explicit 
and  legitimate  purposes  and  not  further  processed  in  a  way  incompatible  with  those  purposes. 

The  United  States  also  has  laws  placing  purpose  restrictions  on  information  in  some  domains  such  as  the 
Health  Insurance  Portability  and  Accountability  Act  (HIPAA)  [Off03]  for  medical  information  and  the 
Gramm-Leach-Bliley  Act  [UnilO]  for  financial  records.  These  laws  and  best  practices  motivate  organi¬ 
zations  to  discuss  in  their  privacy  policies  the  purposes  for  which  they  will  use  information. 

Some  privacy  policies  warn  users  that  the  policy  provider  may  use  certain  information  for  certain  pur¬ 
poses.  For  example,  the  privacy  policy  of  a  medical  provider  states,  “We  may  disclose  your  [protected  health 
information]  for  public  health  activities  and  purposes  [. . .]”  [Was03].  Such  warnings  do  not  constrain  the 
behavior  of  the  policy  provider. 

Other  policies  that  prohibit  using  certain  information  for  a  purpose  do  constrain  the  behavior  of  the 
policy  provider.  Examples  include  the  privacy  policy  of  Yahoo!  Email,  which  states  that  “Yahoo !’s  practice 
is  not  to  use  the  content  of  messages  stored  in  your  Yahoo!  Mail  account  for  marketing  purposes”  [YahlOb, 
emphasis  added]. 

Some  policies  even  limit  the  use  of  certain  information  to  an  explicit  list  of  purposes.  The  privacy  policy 
of  The  Bank  of  America  states,  “Employees  are  authorized  to  access  Customer  Information  for  business 
purposes  only .”  [Ban05,  emphasis  added].  The  HIPAA  Privacy  Rule  [Off03]  requires  that  covered  entities 
(e.g.,  health  care  providers  and  business  partners)  only  use  or  disclose  protected  health  information  about  a 
patient  with  that  patient’s  written  authorization  or: 

[. . .]  for  the  following  purposes  or  situations:  (1)  To  the  Individual  [...];  (2)  Treatment,  Pay¬ 
ment,  and  Health  Care  Operations;  (3)  Opportunity  to  Agree  or  Object;  (4)  Incident  to  an  oth¬ 
erwise  permitted  use  and  disclosure;  (5)  Public  Interest  and  Benefit  Activities;  and  (6)  Limited 
Data  Set  for  the  purposes  of  research,  public  health  or  health  care  operations. 

These  examples  show  that  verifying  that  an  organization  obeys  a  privacy  policy  requires  a  semantics  of 
purpose  restrictions.  In  particular,  enforcement  requires  the  ability  to  determine  that  the  organization  under 
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scrutiny  obeys  at  least  two  classes  of  purpose  restrictions.  As  shown  in  the  example  rule  from  Yahoo!,  the 
first  requirement  is  that  the  organization  does  not  use  certain  sensitive  information  for  a  given  purpose.  The 
second,  as  the  example  rule  from  HIPAA  shows,  is  that  the  organization  uses  certain  sensitive  information 
only  for  a  given  list  of  purposes.  We  call  the  first  class  of  restrictions  prohibitive  rules  (not-for)  and  the 
second  class  exclusivity  rules  (only-for).  A  prohibitive  rule  disallows  an  action  for  a  particular  purpose.  An 
exclusivity  rule  disallows  an  action  for  every  purpose  other  than  the  exceptions  the  rule  lists.  Each  class  of 
rule  requires  determining  whether  the  organization’s  behavior  is  for  a  purpose,  but  they  differ  in  whether 
this  indicates  a  violation  or  compliance,  respectively. 

For  example,  consider  a  physician  accessing  a  medical  record.  Under  the  HIPAA  Privacy  Rule,  the 
physician  may  access  the  record  only  for  certain  purposes  such  as  treatment,  research,  and  billing.  Thus, 
for  an  trusted  auditor  (either  internal  or  external)  to  determine  whether  the  physician  has  obeyed  the  Pri¬ 
vacy  Rule  requires  the  auditor  to  determine  the  purposes  for  which  the  physician  accessed  the  record.  The 
auditor’s  ability  to  determine  the  purposes  behind  actions  is  limited  since  the  auditor  can  only  observe  the 
behavior  of  the  physician.  As  a  physician  may  perform  the  exact  same  actions  for  different  purposes,  the 
auditor  can  never  be  sure  of  the  purposes  behind  an  action.  However,  if  the  auditor  determines  that  the 
record  access  could  not  have  possibly  been  for  any  of  the  purposes  allowed  under  the  Privacy  Rule,  then  the 
auditor  knows  that  the  physician  violated  the  policy. 

Manual  enforcement  of  these  privacy  policies  is  labor  intensive  and  error  prone.  Thus,  to  reduce  costs 
and  make  their  operations  more  trustworthy,  organizations  would  like  to  automate  the  enforcement  of  the 
privacy  policies  governing  their  operations;  tool  support  for  this  activity  is  beginning  to  emerge  in  the  mar¬ 
ket.  For  example,  FairWarning  and  Cerner’s  P2Sentinel  offer  automated  services  for  the  detection  of  privacy 
breaches  in  a  hospital  setting  [Fai,  Cer].  Meanwhile,  previous  research  has  proposed  formal  methods  to  en¬ 
force  purpose  restrictions  [AKSX02,  BBF05,  HA05,  AF07,  BF08,  PGY08,  JSNS09,  NBF+10,  EKWB11], 

However,  each  of  these  endeavors  starts  by  assuming  that  actions  or  sequences  of  actions  are  labeled 
with  the  purposes  they  are  for.  They  avoid  analyzing  the  meaning  of  purpose  and  provide  no  method  of 
performing  this  labeling  other  than  through  intuition  alone.  The  absence  of  a  formal  semantics  to  guide  this 
determination  has  hampered  the  development  of  methods  for  ensuring  policy  compliance.  Such  a  definition 
would  provide  insights  into  how  to  develop  tools  that  identify  suspicious  accesses  in  need  of  detailed  auditing 
and  algorithms  for  determining  which  purposes  an  action  could  possibly  be  for.  Such  a  definition  would  also 
show  which  enforcement  approaches  are  most  accurate.  More  fundamentally,  such  a  definition  could  frame 
the  scientific  basis  of  a  societal  and  legal  understanding  of  purpose  and  of  privacy  policies  that  use  the 
notion  of  purpose.  Such  a  foundation  can,  for  example,  guide  implementers  as  they  codify  in  software  an 
organization’s  interpretation  of  internal  and  government-imposed  privacy  policies. 


1.2  Motivation  of  Our  Approach 

We  start  with  an  informal  example  that  suggests  that  an  action  is  for  a  purpose  if  the  action  is  part  of  a 
plan  for  achieving  that  purpose.  Consider  a  physician  working  at  a  hospital  who,  as  a  specialist,  also  owns 
a  private  practice  that  tests  for  bone  damage  using  a  novel  technique  for  extracting  information  from  X-ray 
images.  After  seeing  a  patient  and  taking  an  X-ray,  the  physician  forwards  the  patient’s  medical  record  in¬ 
cluding  the  X-ray  to  his  private  practice  to  apply  this  new  technology.  As  this  action  entails  the  transmission 
of  protected  health  information,  the  physician  will  have  violated  HIPAA  if  this  transmission  is  not  for  one 
of  the  purposes  HIPAA  allows.  The  physician  would  also  run  afoul  of  the  hospital’s  own  policies  governing 
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when  outside  consultations  are  permissible  unless  this  action  was  for  a  legitimate  purpose.  Finally,  the  pa¬ 
tient’s  insurance  will  only  reimburse  the  costs  associated  with  this  consultation  if  a  medical  reason  (purpose) 
exists  for  them.  The  physician  claims  that  this  consultation  was  for  reaching  a  diagnosis.  As  such,  it  is  for 
the  purpose  of  treatment  and,  therefore,  allowed  under  each  of  these  policies.  The  hospital  auditor,  however, 
has  selected  this  action  for  investigation  since  the  physician’s  making  a  referral  to  his  own  private  practice 
makes  possible  the  alternate  motivation  of  profit. 

Whether  the  physician  violated  these  policies  depends  upon  details  not  presented  in  the  above  descrip¬ 
tion.  For  example,  we  would  expect  the  auditor  to  ask  questions  such  as: 

1.  Was  the  test  relevant  to  the  patient’s  condition? 

2.  Did  the  patient  benefit  medically  from  having  the  test? 

3.  Was  this  test  the  best  option  for  the  patient? 

We  will  introduce  these  details  as  we  introduce  each  of  the  factors  relevant  to  the  purposes  behind  the 
physician’s  actions. 

States  and  Actions.  Sometimes  the  purposes  for  which  an  agent  takes  an  action  depend  upon  the  previous 
actions  and  the  state  of  the  system.  In  the  above  example,  whether  the  test  is  relevant  depends  upon  the 
condition  of  the  patient,  that  is,  the  state  that  the  patient  is  in. 

While  an  auditor  could  model  the  act  of  transmitting  the  record  as  two  (or  more)  different  actions  based 
upon  the  state  of  the  patient,  modeling  two  concepts  with  one  formalism  could  introduce  errors.  A  better 
approach  is  to  model  the  state  of  the  system.  The  state  captures  the  context  in  which  the  physician  takes  an 
action  and  enables  the  purposes  of  an  action  to  depend  upon  the  actions  that  precede  it. 

The  physician’s  own  actions  also  affect  the  state  of  the  system  and,  thus,  the  purposes  for  which  his 
actions  are.  For  example,  had  the  physician  transmitted  the  patient’s  medical  record  before  taking  the  X-ray, 
then  the  transmission  could  not  have  been  for  treatment  since  the  physician’s  private  practice  only  operates 
on  X-rays  and  would  have  no  use  for  the  record  without  the  X-ray. 

The  above  example  illustrates  that  when  an  action  is  for  a  purpose,  the  action  is  part  of  a  sequence  of 
actions  that  can  lead  to  a  state  in  which  some  goal  associated  with  the  purpose  is  achieved.  In  the  example, 
the  goal  is  reaching  a  diagnosis.  Only  when  the  X-ray  is  first  added  to  the  record  is  this  goal  reached. 

Non-redundancy.  Some  actions,  however,  may  be  part  of  such  a  sequence  without  actually  being  for 
the  purpose.  For  example,  suppose  that  the  patient’s  X-ray  clearly  shows  the  patient’s  problem.  Then,  the 
physician  can  reach  a  diagnosis  without  sending  the  record  to  the  private  practice.  Thus,  while  both  taking 
the  X-ray  and  sending  the  medical  record  might  be  part  of  a  sequence  of  actions  that  leads  to  achieving  a 
diagnosis,  the  transmission  does  not  actually  contribute  to  achieving  the  diagnosis:  the  physician  could  omit 
it  and  the  diagnosis  could  still  be  reached. 

From  this  example,  it  may  be  tempting  to  conclude  that  an  action  is  for  a  purpose  only  if  that  action  is 
necessary  to  achieve  that  purpose.  However,  consider  a  physician  who,  to  reach  a  diagnosis,  must  either 
send  the  medical  record  to  a  specialist  or  take  an  MRI.  In  this  scenario,  the  physician’s  sending  the  record  to 
the  specialist  is  not  necessary  since  he  could  take  an  MRI.  Likewise,  taking  the  MRI  is  not  necessary.  Yet, 
the  physician  must  do  one  or  the  other  and  that  action  will  be  for  the  purpose  of  diagnosis.  Thus,  an  action 
may  be  for  a  purpose  without  being  necessary  for  achieving  the  purpose. 
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Rather  than  necessity,  we  use  the  weaker  notion  of  non-redundancy  found  in  work  on  the  semantics  of 
causation  (e.g.,  [Mac74]).  Given  a  sequence  of  actions  that  achieves  a  goal,  an  action  in  it  is  redundant 
if  that  sequence  with  that  action  removed  (and  otherwise  unchanged)  also  achieves  the  goal.  An  action  is 
non-redundant  if  removing  that  action  from  the  sequence  would  result  in  the  goal  no  longer  being  achieved. 
Thus,  non-redundancy  may  be  viewed  as  necessity  under  an  otherwise  fixed  sequence  of  actions. 

For  example,  suppose  the  physician  decides  to  send  the  medical  record  to  the  specialist.  Then,  the 
sequence  of  actions  modified  by  removing  this  action  would  not  lead  to  a  state  in  which  a  diagnosis  is 
reached.  Thus,  the  transmission  of  the  medical  record  to  the  specialist  is  non-redundant.  Flowever,  had 
the  X-ray  revealed  to  the  physician  the  diagnosis  without  needing  to  send  it  to  a  specialist,  the  sequence 
of  actions  that  results  from  removing  the  transmission  from  the  original  sequence  would  still  result  in  a 
diagnosis.  Thus,  the  transmission  would  be  redundant. 


Quantitative  Purposes.  Above  we  implicitly  presumed  that  the  diagnosis  from  either  the  specialist  or  an 
MRI  had  equal  quality.  This  need  not  be  the  case.  Indeed,  many  purposes  are  actually  fulfilled  to  varying 
degrees.  For  example,  the  purpose  of  marketing  is  never  completely  achieved  since  there  is  always  more 
marketing  to  do.  Thus,  we  model  a  purpose  by  assigning  to  each  state-action  pair  a  number  that  describes 
how  well  that  action  fulfills  that  purpose  when  performed  in  that  state.  We  require  that  the  physician  selects 
the  test  that  maximizes  the  quality  of  the  diagnosis  as  determined  by  total  purpose  score  accumulated  over 
all  his  actions. 

We  must  adjust  our  notion  of  non-redundancy  accordingly.  An  action  is  non-redundant  if  removing  that 
action  from  the  sequence  would  result  in  the  purpose  being  satisfied  less.  Now,  even  if  the  physician  can 
make  a  diagnosis  himself,  sending  the  record  to  a  specialist  would  be  non-redundant  if  getting  a  second 
opinion  improves  the  quality  of  the  diagnosis. 


Probabilistic  Systems.  The  success  of  many  medical  tests  and  procedures  is  probabilistic.  For  example, 
with  some  probability  the  physician’s  test  may  fail  to  reach  a  diagnosis.  The  physician  would  still  have 
transmitted  the  medical  record  for  the  purpose  of  diagnosis  even  if  the  test  failed  to  reach  one.  This  pos¬ 
sibility  affects  our  semantics  of  purpose:  now  an  action  may  be  for  a  purpose  even  if  that  purpose  is  never 
achieved. 

To  account  for  such  probabilistic  events,  we  model  the  outcome  of  the  physician’s  actions  as  proba¬ 
bilistic.  For  an  action  to  be  for  a  purpose,  we  require  that  there  be  a  non-zero  probability  of  the  purpose 
being  achieved  and  that  the  physician  attempts  to  maximize  the  expected  reward.  In  essence,  we  require  that 
the  physician  attempts  to  achieve  a  diagnosis.  Thus,  the  auditee’s  plan  determines  the  purposes  behind  his 
actions  rather  than  just  the  actions  themselves. 


Overview  of  Approach.  This  example  illustrates  key  factors  in  determining  whether  an  action  is  for  a 
purpose.  In  particular,  the  auditor  should  model  the  auditee  as  an  agent  that  interacts  with  an  environment 
model.  The  environment  model  shows  how  the  actions  the  auditee  can  perform  affect  the  state  of  the  en¬ 
vironment.  It  also  models  how  well  each  state  and  action  satisfies  each  purpose  that  the  modeled  auditee 
might  possibly  find  motivating.  Limiting  consideration  to  one  purpose,  the  environment  model  becomes  a 
Markov  Decision  Process  (MDP)  where  the  degree  of  satisfaction  of  that  purpose  is  the  reward  function  of 
the  MDP.  If  the  auditee  is  motivated  to  act  by  only  that  purpose,  then  the  auditee’s  actions  must  correspond 
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to  an  optimal  plan  for  this  MDP  and  these  actions  me,  for  that  purpose.  Additionally,  we  use  a  stricter  defi¬ 
nition  of  optimal  than  used  for  standard  MDPs  to  reject  redundant  actions  that  neither  decrease  nor  increase 
the  total  reward. 

In  this  example,  the  auditor  would  examine  an  MDP  modeling  the  physician’s  environment  with  the 
quality  of  treatment  as  the  reward  function  to  be  optimized.  If  no  optimal  plans  for  this  MDP  involve 
ordering  the  test,  then  the  auditor  can  conclude  definitively  that  the  physician  did  not  order  the  test  for 
treatment. 


1.3  Statement  of  Thesis 

The  goal  of  this  work  is  to  study  the  meaning  of  purpose  in  the  context  of  enforcing  privacy  policies.  We 
aim  to  provide  formal  definitions  suitable  for  automating  the  enforcement  of  purpose  restrictions.  Since 
post-hoc  auditing  provides  the  perspective  often  required  to  determine  the  purpose  of  an  action,  we  focus  on 
automated  auditing.  However,  we  believe  our  semantics  is  applicable  to  other  enforcement  mechanisms  and 
may  also  clarify  informal  reasoning.  For  example,  in  Chapter  4,  we  use  it  to  create  an  operating  procedure 
that  encourages  compliance  with  a  purpose  restriction. 

We  find  that  planning  is  central  to  the  meaning  of  purpose.  We  see  the  role  of  planning  in  the  definition 
of  the  sense  of  the  word  “purpose”  most  relevant  to  our  work  [SW89]: 

The  object  for  which  anything  is  done  or  made,  or  for  which  it  exists;  the  result  or  effect 
intended  or  sought;  end,  aim. 

Similarly,  work  on  cognitive  psychology  calls  purpose  “the  central  determinant  of  behavior”  [DKP96,  p  19] . 
If  our  auditors  are  concerned  with  rational  auditees  (the  person  or  organization  being  audited),  then  we 
may  assume  the  auditee  uses  a  plan  to  determine  what  actions  it  will  perform  in  its  attempt  to  achieve  its 
purposes.  We  (as  have  philosophers  [Tay66])  conclude  that  if  an  auditee  chooses  to  perform  an  action  a 
while  planning  to  achieve  the  purpose  p,  then  the  auditee’s  action  a  is  for  the  purpose  p. 

Our  goal  is  to  make  these  notions  formal  in  a  manner  useful  for  automation  and  computation.  In  partic¬ 
ular,  this  dissertation  argues  the  following  thesis: 

A  model  of  planning  underlies  a  formalization  of  purpose  restrictions  that  enables  their  auto¬ 
mated  enforcement. 

As  suggested  by  the  example  in  the  previous  section,  we  start  by  using  the  MDP  formalism  as  the  model  of 
planning.  However,  when  we  consider  purpose  restrictions  on  information  use,  we  use  Partially  Observable 
Markov  Decision  Processes  (POMDPs)  instead.  In  either  case,  we  compare  the  behaviors  of  the  auditee  as 
recorded  in  the  log  to  how  the  auditee  would  behave  when  selecting  a  plan  of  action  using  a  model  for  the 
purpose  in  question. 

To  argue  our  thesis,  this  dissertation  presents  various  contributions  as  summarized  in  Section  1.5  and 
is  structured  as  explained  in  Section  1.6.  In  particular,  we  provide  a  formal  semantics  using  a  planning 
model  to  determine  whether  a  sequence  of  actions  is  for  a  purpose,  and  we  build  upon  this  formalization 
algorithms  for  applying  auditing  to  purpose  restrictions.  Before  discussing  these  contributions,  we  put  our 
thesis  in  context  by  summarizing  prior  work. 
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1.4  Prior  Work 


In  this  section,  we  provide  a  brief  overview  of  prior  work  to  show  that  prior  work  has  not  demonstrated  our 
thesis.  In  Chapter  7,  we  cover  prior  work  in  more  detail. 

The  planning-based  semantics  of  purpose  we  use  follows  from  informal  philosophical  inquiry  [Tay66]. 
Taylor  notes  that  whether  an  action  is  for  a  purpose  depends  upon  the  plan  that  leads  the  actor  to  perform 
the  action  and  not  on  whether  the  actor  succeeds  in  furthering  that  purpose.  While  presenting  numerous 
examples  illustrating  the  distinction,  this  prior  work  did  not  provide  formal  definitions,  discuss  informa¬ 
tion  use,  formalize  purpose  restrictions,  discuss  their  enforcement,  provide  algorithms,  or  present  empirical 
validation. 

Works  that  do  provide  formal  models  fall  into  one  of  either  two  strands:  enforcing  purpose  restrictions 
and  goal  inference.  Our  work  builds  on  both  of  these  strands. 


Enforcing  Purpose  Restrictions.  Most  prior  work  on  using  formal  methods  for  enforcing  purpose  restric¬ 
tions  has  focused  on  when  observable  actions  further  a  purpose  [AKSX02,  BBL05,  AF07,  BL08,  PGY08, 
JSNS09,  NBL+10,  EKWB11].  These  works  do  not  empirically  show  that  their  formalism  corresponds  to 
the  actual  meaning  of  purpose  restrictions.  Our  thesis  argues  that  an  action  is  for  a  purpose  when  that  action 
is  part  of  a  plan  for  that  purpose,  as  opposed  to  furthering  that  purpose.  Furthermore,  none  of  these  works 
formalize  information  use. 

The  prior  work  of  Hayati  and  Abadi  on  enforcing  purpose  restricts  does  not  fit  into  the  above  mold  [HA05]. 
It  provides  a  type  system  for  tracking  information  flow  in  programs  with  purpose  restrictions  in  mind.  How¬ 
ever,  their  work  does  not  formalize  when  information  use  is  for  a  purpose  since  it  presupposes  that  the 
programmer  can  determine  whether  a  function  uses  information  for  a  certain  purpose  and  provides  no  for¬ 
mal  guidance  for  making  this  determination. 

A  closely  related  enforcement  problem  is  that  of  minimal  disclosure,  which  requires  that  the  amount 
of  information  used  in  granting  a  request  for  access  should  be  as  little  as  possible  while  still  achieving  the 
purpose  behind  the  request.  However,  purpose  restrictions  do  not  require  the  amount  of  information  used 
to  be  minimal  and  often  involve  purposes  that  are  never  fully  achieved  (e.g.,  more  marketing  is  always 
possible).  Furthermore,  works  on  minimal  disclosure  [MMZ06,  BMDS07]  do  not  use  a  planning-based 
formalism.  They  also  lack  the  probabilistic  transitions  necessary  to  see  the  distinction  between  information 
use  furthering  a  purpose  and  being  part  of  a  plan  for  furthering  a  purpose. 


Goal  Inference.  The  essence  of  our  formalization  of  purpose  restrictions  is  to  reduce  the  problem  to  one 
of  goal  inference.  Goal  inference  is  the  problem  of  determining,  from  the  actions  and  states  of  a  planning 
agent,  the  goal  that  agent  is  pursuing.  Under  our  formalization,  the  auditee  is  the  planning  agent  and  the 
possible  purposes  are  the  possible  goals. 

The  previous  work  on  goal  inference  most  closely  related  to  ours  use  models  similar  to  the  MDP  and 
POMDP  models  we  use  [RSM04,  SGBR04,  VR05,  BTS06,  VR06,  BTS07,  BST09,  BKvdWvRIO,  BST11, 
RG11].  However,  none  of  these  works  provides  a  goal  inference  algorithm  suitable  for  our  auditing  task. 
In  particular,  they  are  each  concerned  with  determining  the  probability  that  a  sequence  of  actions  are  for  a 
purpose,  whereas  we  are  concerned  with  whether  an  action  or  use  of  information  could  be  for  a  purpose. 
Thus,  we  must  develop  a  formalism  for  information  use  and  a  method  of  determining  when  a  plan  depends 
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upon  information  use.  We  must  also  concern  ourselves  with  the  soundness  of  our  audit  algorithm  rather  than 
its  accuracy  in  terms  of  a  predicted  probability. 

1.5  Summary  of  Contributions 

To  argue  our  thesis,  we  offer  the  following  novel  contributions: 

1.  A  formal  treatment  of  purpose  restrictions  on  actions  using  a  planning-based  formalism  of  purpose; 

2.  A  semantic  formalism  of  when  information  use  is  for  a  purpose; 

3.  An  empirical  validation  that  our  planning-based  formalism  closely  corresponds  to  how  people  under¬ 
stand  the  word  “purpose”  as  used  in  purpose  restrictions; 

4.  An  algorithm  and  its  implementation  for  auditing  employing  our  formalism; 

5.  The  application  of  our  formalism  to  aid  the  understanding  of  privacy  concerns  found  in  the  healthcare 
domain;  and 

6.  The  characterization  of  previous  policy  enforcement  methods  in  our  formalism  and  a  comparative 
study  of  their  expressiveness. 

We  believe  that  these  contributions  together  are  sufficient  to  demonstrate  our  thesis.  In  particular,  the  first 
three  contributions  illustrate  that  planning  can  formalize  purpose  restrictions.  The  last  three  illustrate  that 
our  formalism  may  aid  automated  auditing  and  analysis.  Our  success,  however,  must  be  qualified  by  the 
limited  ability  of  our  formalism  to  handle  multiple  purposes  and  the  intricacies  of  human  planning. 

Although  motivated  by  our  goal  to  formalize  the  notions  of  use  and  purpose  prevalently  found  in  privacy 
policies,  our  work  is  more  generally  applicable  to  a  broad  range  of  policies,  such  as  fiscal  policies  governing 
travel  reimbursement  or  statements  of  ethics  proscribing  conflicts  of  interest. 

1.6  Structure  of  Dissertation 

Chapter  2  discusses  when  a  sequence  of  actions  is  for  a  purpose.  In  Section  2.1,  we  present  a  formalism 
providing  a  semantics  to  purpose  restrictions  based  upon  planning  with  MDPs.  Section  2.2  provides  an 
auditing  method  and  discusses  the  ramifications  of  the  auditor  observing  only  the  behaviors  of  the  auditee 
and  not  the  underlying  planning  process  of  the  auditee  that  resulted  in  these  behaviors.  We  show  that  in 
some  circumstances,  the  auditor  can  still  acquire  enough  information  to  determine  that  the  auditee  violated 
the  privacy  policy.  To  do  so,  the  auditor  must  first  use  our  MDP  model  to  construct  all  the  possible  behaviors 
that  the  privacy  policy  allows  and  then  compare  it  with  all  the  behaviors  of  the  auditee  that  could  have 
resulted  in  the  observed  auditing  log.  Section  2.3  presents  an  algorithm  for  auditing  based  on  our  formal 
definitions,  illustrating  the  relevance  of  our  work. 

In  Chapter  3,  we  extend  our  formalism  to  answer  the  question  of  when  information  use  is  for  a  purpose. 
Many  uses  of  information  may  be  modeled  as  an  action,  which  makes  the  formalism  of  Chapter  2  applicable. 
However,  this  formalism  cannot  detect  when  information  is  used  by  the  planning  process  itself.  Thus,  we 
extend  the  formalism  to  use  Partially  Observable  Markov  Decision  Processes  (POMDPs)  that  can  capture 
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such  information  usage.  The  explicitness  of  partial  observations  in  the  POMDP  model  allows  us  to  consider 
how  the  agent  would  plan  if  some  observations  were  conflated  to  ignore  information  of  interest.  We  provide 
an  algorithm  for  auditing  that  tests  whether  an  agent  uses  information  for  a  purpose  by  comparing  the 
behaviors  of  the  agent  to  the  behaviors  it  would  manifest  had  it  planned  its  actions  in  this  simulated  state  of 
ignorance. 

To  validate  our  work,  we  consider  its  application  and  perform  an  empirical  study.  In  Chapter  4,  we 
address  a  concern  in  the  healthcare  domain  involving  the  rise  of  Regional  Health  Information  Organizations 
(RHIOs).  We  use  the  formalism  of  Chapter  2  to  create  an  operating  procedure  that  encourages  compliance 
with  a  purpose  restriction. 

In  Chapter  5,  we  present  the  results  of  a  survey  testing  how  people  understand  the  word  “purpose”.  The 
survey  compares  our  planning-based  approach  to  the  prior  approach  based  on  whether  an  action  improves 
the  satisfaction  of  a  purpose.  We  find  that  our  approach  matches  the  survey  participants’  responses  much 
more  closely  than  the  prior  approach. 

Most  auditees  are  actually  interested  in  multiple  purposes  and  select  plans  that  simultaneously  satisfy 
as  many  of  the  desired  purposes  as  possible.  Handling  the  interactions  among  purposes  complicates  our 
semantics.  In  particular,  actions  selected  by  a  single  plan  may  be  for  different  purposes.  In  Chapter  6,  we 
present  examples  showing  when  our  semantics  can  extend  to  handle  multiple  purposes  and  when  difficulties 
arise  in  determining  which  purposes  an  action  is  for.  Currently,  the  state-of-the-art  in  the  understanding 
of  human  planning  limits  our  abilities  to  improve  upon  our  semantics.  However,  as  this  understanding 
improves,  one  may  replace  our  formalism  based  on  MDPs  and  POMDPs  with  more  detailed  ones  while 
retaining  our  general  framework  of  defining  purpose  restrictions  in  terms  of  planning. 

Chapter  7  discusses  related  work.  Even  without  a  formalism  for  multiple  purposes,  our  work  is  sufficient 
to  put  the  previous  work  on  enforcing  privacy  policies  on  firm  semantic  ground.  In  Section  7.1,  we  use  our 
formalism  to  discuss  the  strengths  and  weaknesses  of  each  such  approach.  In  particular,  we  find  that  each 
approach  enforces  the  policy  given  the  set  of  all  possible  allowed  behaviors,  which  is  a  set  that  our  approach 
can  construct.  We  also  compare  the  previous  auditing  approaches,  which  differ  in  their  trade-offs  between 
auditing  complexity  and  accuracy  of  representing  this  set  of  behaviors.  The  remaining  sections  discuss 
works  related  to  ours  by  methodology  rather  than  goals. 

We  end  in  Chapter  8  by  presenting  interesting  directions  for  future  work  and  conclusions.  Appendix  A 
provides  additional  background  on  POMDPs.  Appendix  B  presents  details  about  the  empirical  study.  Ap¬ 
pendix  C  summarizes  notation. 


Chapter  2 


Action  for  a  Purpose 


2.1  Planning  for  a  Purpose 

In  this  section,  we  present  a  formalism  for  planning  that  accounts  for  quantitative  purposes,  probabilistic 
systems  and  non-redundancy.  We  first  review  Markov  Decision  Processes  (MDPs) — a  natural  model  for 
planning  with  probabilistic  systems.  In  general,  an  agent  planning  for  some  purpose  constructs  an  MDP 
to  help  select  its  actions.  The  MDP  models  the  agent’s  environment  and  how  the  agent’s  actions  affect 
the  environment’s  state.  We  use  the  reward  function  of  the  MDP  to  quantify  the  degree  of  satisfaction  of 
a  purpose  upon  taking  an  action  from  a  state.  The  agent  selects  a  plan  that  determines  for  each  state,  the 
action  that  the  agent  will  perform  if  the  agent  reaches  that  state.  The  plan  the  agent  selects  optimizes  the 
expected  total  discounted  reward  (degree  of  purpose  satisfaction)  under  the  MDP. 

We  then  develop  a  stricter  definition  of  optimal  than  used  with  standard  MDPs.  We  use  this  definition  to 
create  models  we  call  “NMDPs”  for  N on-reclundant  MDPs.  In  addition  to  requiring  that  strategies  optimize 
the  expected  total  discounted  reward,  NMDPs  exclude  strategies  that  employ  redundant  actions  that  neither 
decrease  nor  increase  the  total  reward.  We  end  with  an  example  illustrating  the  use  of  an  NMDP  to  model 
an  audited  environment. 

2.1.1  Markov  Decision  Processes 

An  MDP  may  be  thought  of  as  a  probabilistic  automaton  where  each  transition  is  labeled  with  a  reward  in 
addition  to  an  action.  Rather  than  having  accepting  or  goal  states,  the  “goal”  of  an  MDP  is  to  maximize  the 
total  reward  over  time.  Furthermore,  we  distinguish  between  the  MDP,  which  is  a  model  of  an  environment, 
and  the  agent,  which  is  an  entity  using  the  model  to  select  its  actions.  Thus,  while  it  is  convenient  to  speak 
informally  of  actions  arising  from  an  MDP,  strictly  speaking  actions  are  performed  by  an  agent  because  of 
the  agent’s  use  of  the  MDP  model  to  select  these  actions. 

To  define  partially  observable  MDPs,  let  Dist(X)  denote  the  space  of  all  distributions  over  the  set  X. 
That  is,  /  £  Dist(X)  is  a  function  from  X  to  the  reals  between  0  and  1  that  obeys  the  standard  of  axioms  of 
probability  theory  making  it  a  distribution  over  X.  An  MDP  is  a  tuple  m  =  (S,  A,  t,r,  7)  where 

•  S  is  a  set  of  states; 

•  A  is  a  set  of  actions; 
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•  t  :  S  x  A  — >  Dist(<S),  a  transition  function  from  a  state  and  an  action  to  a  distribution  over  states; 

•  r:5xd->l,a  reward  function;  and 

•  7,  a  discount  factor  such  that  0  <  7  <  1. 

where  R  is  the  set  of  real  numbers.  For  each  state  s  in  S,  the  agent  using  the  MDP  to  plan  selects  an 
action  a  from  A  to  perform.  Upon  performing  the  action  a  in  the  state  s,  the  agent  receives  the  reward 
r(s,  a).  The  environment  then  transitions  to  a  new  state  s'  with  probability  p(s')  where  //  is  the  distribution 
provided  by  t(s,a).  The  goal  of  the  agent  is  to  select  actions  to  maximize  its  expected  total  discounted 
reward  E  7* ft]  where  i  €  N  (the  set  of  natural  numbers)  ranges  over  time  modeled  as  discrete  steps, 

Pi  is  the  reward  at  time  i,  and  the  expectation  is  taken  over  the  probabilistic  transitions.  The  discount  factor 
7  accounts  for  the  preference  of  people  to  receive  rewards  sooner  than  later.  It  may  be  thought  of  as  similar 
to  inflation.  We  require  that  7  <  1  to  ensure  that  the  expected  total  discounted  reward  is  bounded. 

We  formalize  the  agent’s  plan  as  a  stationary  strategy  (commonly  called  a  “policy”,  but  we  reserve  that 
word  for  privacy  policies).  A  stationary  strategy  is  a  function  cr  from  the  state  space  S  to  the  set  A  of  actions 
(i.e.,  a  :  S  — >  A)  such  that  at  a  state  s  in  S,  the  agent  always  selects  to  perform  the  action  <r(s).  The  value 
of  a  state  s  under  a  strategy  a  is 


(vm  is  typically  written  as  Vm,  but  we  reserve  Vm  for  POMDPs  in  Chapter  3.)  The  Bellman  equation  [Bel52] 
shows  that 


s'eS 


A  strategy  a*  is  optimal  if  and  only  if  for  all  states  s,  vrn(a* ,  s )  =  maxCT  vm(a.  s ).  At  least  one  optimal 
policy  always  exists  (see,  e.g.,  [RN03]).  Furthermore,  if  rr*  is  optimal,  then 


We  denote  this  set  of  optimal  strategies  as  opt((<S,  A,  t.  r,  7)),  or  when  the  transition  system  is  clear  from 
context,  as  opt(r).  Such  strategies  are  sufficient  to  maximize  the  agent’s  expected  total  discounted  reward 
despite  only  depending  upon  the  current  state  of  the  MDP. 

Under  this  formalism,  the  auditee  plays  the  role  of  the  agent  optimizing  the  MDP  to  plan.  We  presume 
that  each  purpose  may  be  modeled  as  a  reward  function.  That  is,  we  assume  the  degree  to  which  a  purpose 
is  satisfied  may  be  captured  by  a  function  from  states  and  actions  to  a  real  number.  The  higher  the  number, 
the  higher  the  degree  to  which  that  purpose  is  satisfied.  When  the  auditee  wants  to  plan  for  a  purpose  p,  it 
uses  a  reward  function,  rp,  such  that  rp(s.  a)  is  the  degree  to  which  taking  the  action  a  from  state  s  aids 
the  purpose  p.  We  also  assume  that  the  expected  total  discounted  reward  can  capture  the  degree  to  which 
a  purpose  is  satisfied  over  time.  We  say  that  the  auditee  plans  for  the  purpose  p  when  the  auditee  adopts  a 
strategy  cr  that  is  optimal  for  the  MDP  ( S ,  A.  t,  rp ,  7). 
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Executions  and  Behaviors.  Given  the  strategy  a  and  the  actual  results  of  the  probabilistic  transitions 
yielded  by  t,  the  agent  exhibits  an  execution.  We  represent  this  execution  as  an  infinite  sequence  e  = 
[si,  ai,  S2,  02, . .  •]  of  alternating  states  and  actions  starting  with  a  state,  where  st  is  the  zth  state  that  the 
agent  was  in  and  a*  is  the  fth  action  the  agent  took,  for  all  i  in  N.  We  call  a  finite  prefix  b  of  an  execution  e 
a  behavior. 

Not  every  sequence  of  states  and  actions  is  a  possible  execution  of  the  agent  under  an  MDP.  For  an 
execution  to  be  possible  under  an  MDP,  it  must  be  consistent  with  some  strategy  and  the  transitions  relation 
t.  We  say  an  execution  e  is  consistent  with  a  strategy  a  if  and  only  if  a,  =  a{sf)  for  all  i  in  N  where  at  is 
the  ith  action  in  e  and  st  is  the  /th  state  in  e.  A  behavior  is  consistent  with  a  strategy  if  it  can  be  extended  to 
an  execution  consistent  with  that  strategy. 

To  determine  whether  an  execution  is  possible  under  t,  let  a  contingency  k  be  a  function  from  5xTxN 
to  S  such  that  k(s,  a,  i)  is  the  state  that  results  from  taking  the  action  a  in  the  state  s  as  the  ith  action. 
We  say  that  a  contingency  k  is  consistent  with  an  MDP  if  and  only  if  n  only  picks  states  to  which  the 
transition  function  t  of  the  MDP  assigns  a  non-zero  probability  to  (i.e.,  for  all  s  in  S,  a  in  A,  and  i  in  N, 

t(s,  o)(k(s,  a,  *))  >  0). 

Given  an  MDP  m,  let  m(s.  k)  be  the  possibly  infinite  state  model  that  results  of  having  k  resolve  all  the 
probabilistic  choices  in  m  and  having  the  model  start  in  state  s.  Let  m(s,  k,  a)  denote  the  execution  that 
results  from  using  the  strategy  a  and  state  s  in  the  non-probabilistic  model  m(s,  k).  Formally,  m(s,  k,  a)  = 
[si,  ai,  S2,  02,  •  •  •]  where  s\  =  s  and  for  all  i  E  N,  a*  =  a(si)  and  Sj+i  =  /c(sj,  a*,  i). 

Consistent  contingencies  capture  the  idea  of  possible  executions.  Formally,  we  say  that  an  execution  e  = 
[si,  ai,  S2,  02, . . .]  is  possible  for  m  if  and  only  if  there  exists  a  state  s  of  m,  a  contingency  k  consistent  with 
m,  and  a  strategy  a  for  m  such  that  e  =  m(s,  k.  a)  Similarly,  we  say  that  a  behavior  b  =  [si,  oi, . . . ,  sn,  an\ 
is  possible  for  m  if  and  only  if  there  exists  a  state  s  of  m,  a  contingency  k  consistent  with  m,  and  a  strategy 
<7  for  m  such  that  b  IZ  rn(s,  k.  a)  where  C  denotes  the  proper-prefix  relation.  The  following  lemma  reduces 
the  global  property  of  a  behavior  being  possible  for  an  MDP  to  local  properties  of  the  MDP. 

Lemma  1.  For  all  MDPs  m  and  behaviors  b  =  [si,  a\, . . . ,  sn,  an]  €  (S  x  A)*,  b  is  possible  for  m  if  and 
only  if  for  all  i  <  n,  t(si,  aj)(sj+i)  >  0  and  for  alii  <  n  and  j  <n,Si  =  Sj  implies  that  ai  =  aj. 

Proof.  Suppose  that  b  is  possible  for  m.  Then,  there  exists  a  state  s  of  m,  a  contingency  k  consistent  with 
m,  and  a  strategy  a  for  m  such  that  b  C  m(s,  k,  a).  Since  b  IZ  m(s,  k,  a),  for  all  i  <  n,  n(si,  ai,  i)  =  Sj+i. 
Since  k  is  consistent  with  m,  for  all  i  <  n,  t(si,  aj)(sj+ 1)  >  0.  Since  a  is  stationary,  ai  =  cr(s,)  =  <r(sj)  = 
aj  for  all  i.j  <  n  such  that  sl  =  Sj. 

Suppose  that  for  all  i  <  n,  t(si ,  aj)(sj+i)  >  0  and  for  all  i  <  n  and  j  <  n,  Si  =  Sj  implies  that  ai  =  aj. 
Let  s  =  si.  Let  a  be  some  strategy  such  a(si)  =  ai  for  all  i  <  n.  Such  a  a  exists  since  Sj  =  sj  implies 
that  ai  =  aj  for  all  i  <  n  and  j  <  n.  Let  n  be  some  contingency  consistent  with  m  such  that  for  all  i  <  n, 
n(si,  ai,  i )  =  Sj+i.  Such  a  k  exists  since  for  all  i  <  n,  t(si,  ai)(si+\)  >  0  .bn.  m(s,  n,  a).  □ 

2.1.2  Non-redundancy 

MDPs  do  not  require  that  strategies  be  non-redundant.  Even  given  that  the  auditee  had  an  execution  e 
from  using  a  strategy  a  in  opt(rp),  some  actions  in  e  might  not  be  for  the  purpose  p.  The  reason  is  that 
some  actions  may  be  redundant  despite  being  costless.  The  MDP  optimization  criterion  behind  opt  prevents 
redundant  actions  from  delaying  the  achievement  of  a  goal  as  the  reward  associated  with  that  goal  would  be 
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further  discounted  making  such  redundant  actions  sub-optimal.  However,  the  optimization  criterion  is  not 
affected  by  redundant  actions  when  they  appear  after  all  actions  that  provide  non-zero  rewards.  Intuitively, 
the  hypothetical  agent  planning  only  for  the  purpose  in  question  would  not  perform  such  unneeded  actions 
even  if  they  have  zero  reward.  Thus,  to  create  our  formalism  of  non-redundant  MDPs  (NMDPs),  we  replace 
opt  with  a  new  optimization  criterion  nopt  that  prevents  these  redundant  actions  while  maintaining  the  same 
transition  structure  as  a  standard  MDP. 

To  account  for  redundant  actions,  we  must  first  contrast  such  actions  with  doing  nothing.  Thus,  we 
introduce  a  distinguished  action  stop  that  stands  for  stopping  and  doing  nothing.  For  all  states  s,  stop  labels 
a  transition  with  zero  reward  (i.e.,  r(s,  stop)  =  0)  that  is  a  self-loop  (i.e.,  f(s.  stopj(.s)  =  1).  (We  could 
put  stop  on  only  the  subset  of  states  that  represent  possible  stopping  points  by  slightly  complicating  our 
formalism.)  Since  we  only  allow  deterministic  stationary  strategies  and  stop  only  labels  self-loops,  this 
decision  is  irrevocable:  once  the  agent  stops  and  does  nothing,  the  agent  does  nothing  forever.  As  selecting 
to  do  nothing  results  in  only  zero  rewards  henceforth,  it  may  be  viewed  as  stopping  with  the  previously 
acquired  total  discounted  reward. 

Proposition  1.  For  all  NMDPs  m,  strategies  a  for  m,  and  states  s,  if  a(s)  =  stop,  then  vTn  ( a.  s )  =  0. 


Proof. 


vm{(T,s)  =  E 


^7V(si,cr(si)) 


Li=0 


where  Sj  is  the  /'th  state  that  the  environment  modeled  by  the  NMDP  enters  starting  with  s  =  sq. 

Proof  by  induction  shows  that  for  all  i,  s,  =  s.  The  base  case  follows  from  the  definition  of  so-  For  the 
inductive  case,  the  inductive  hypothesis  shows  that  s,  =  s.  sl+ 1  =  s'  with  probability  t(sr,  cr(s,;))(V)  = 
t(s,  cr(s))(s/)  =  t(s,  stop)(V)  =  degen(s)  by  the  definition  of  NMDPs  where  degen(s)(s//)  =  1  if  and  only 
if  s"  =  s  and  is  equal  to  0  for  all  other  s" .  Thus,  with  certainty,  slry  \  =  s. 

Thus, 


Vm{cr,s)  =  E 

OO 

5^7  lr(si,a(si)) 

=  E 

i 

^CL 

O 

+-> 

to 

c/T 

1 _ 

=  E 

OO 

,i=0 

,i=0 

.i=0 

□ 


We  use  the  idea  of  stopping  and  doing  nothing  to  make  formal  when  one  execution  contains  more  actions 
than  another  despite  both  being  of  infinite  length.  Given  an  execution  e,  let  active(e)  denote  the  prefix  of  e 
before  the  first  instance  of  stop,  active(e)  will  be  equal  to  e  in  the  case  where  e  does  not  contain  stop.  An 
execution  e\  is  a  proper  sub-execution  of  an  execution  if  and  only  if  active(ei)  (Z  active(e2)  where  IZ  is 
the  proper  prefix  relation.  (We  also  use  C  for  the  prehx-or-equal  relation.)  Note  if  e\  does  not  contain  the 
stop,  it  cannot  be  a  proper  sub-execution  of  any  execution. 

We  use  contingencies  to  compare  strategies.  Given  two  strategies  a  and  cr1,  we  write  a1  A  a  if  and  only 
if  for  all  contingencies  k  and  states  s,  infs,  n.  a')  is  a  proper  sub-execution  of  or  equal  to  mfs,  k,  a),  and  for 
at  least  one  contingency  tf  and  state  s',  infs' ,  tf ,  a')  is  a  proper  sub-execution  of  infs' ,  if .  a).  Intuitively, 
a’  proves  that  o  produces  a  redundant  execution  under  rf  and  s'.  As  we  would  expect,  7  is  a  strict  partial 
ordering  on  strategies. 

Proposition  2.  -<  is  a  strict  partial  order. 
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Proof.  The  proper  sub-execution  relation  is  a  strict  partial  order.  This  follows  directly  from  the  proper-prefix 
relation  C  being  a  strict  partial  order.  We  write  <  for  proper  sub-execution  and  <  for  proper  sub-execution 
or  equal. 

Now,  we  show  that  A  is  also  a  strict  partial  ordering. 

•  Irreflexivity:  for  no  0  is  cr  -<  a.  For  0  -<  0  to  be  true,  there  would  have  to  exist  a  a  G  opt  such  that 
for  at  least  one  contingency  k!  and  s',  m(s',  k/ ,  a')  is  a  proper  sub-execution  of  itself.  However,  this 
is  impossible  since  the  sub-execution  relation  is  strict  partial  order. 

•  Asymmetry:  for  all  o\  and  02,  if  01  A  02,  then  it  is  not  the  case  that  a 2  A  0 1 .  To  show  a  contradiction, 
suppose  cti  A  <7 2  and  <r2  A  o\  are  both  true.  It  would  have  to  be  the  case  that  for  all  contingencies  k 
and  states  s,  m(s,  k,o\)  <  m(s ,  k,  cr2)  and  m(s,  rt,  cr2)  <  m(s,  k,  a\).  Since  <1  is  a  strict  partial  order, 
this  implies  that  for  all  s  and  k,  m(s,  k.  o\)  =  rn(s,  k,  cr2).  Thus,  there  cannot  exist  a  contingency  k' 
and  state  s'  such  that  m(s',  n1 ,  02)  <  rri(s' ,  k' .  o\).  Then  <r2  -<  o\  cannot  be  true,  a  contradiction. 

•  Transitivity:  for  all  01,  <r2,  and  cr3,  if  a\  -<  cr2  and  a 2  A  03,  then  o\  -<  03.  Suppose  o\  -<  02 

and  02  03.  Then  for  all  for  all  contingencies  k  and  states  s,  m(s,K,a  1)  <  rn(s,  n,  09)  and 

m(s,K,  02)  <  m(s,K,  03).  Since  <  has  transitivity,  this  implies  that  m{s,K,o\)  <  m{s,n,af)  for 

all  k  and  s. 

Furthermore,  it  must  be  the  case  that  there  exists  a  contingency  n'  and  state  s'  such  that  m(V,  k/  ,  01)  <1 
m(s',  k',  02).  From  above,  m(s',  k',  02)  <  m(s',  k' ,  03).  Thus,  by  the  transitivity  of  <,  m(s',  k',  01)  <1 
m(s',  k' ,  03)  as  needed.  This  implies  that  01  -*<  03. 

□ 

We  define  nopt(m)  to  be  the  subset  of  opt(m)  holding  only  strategies  0  such  that  for  no  a'  €  opt(m) 
does  a’  A  0.  nopt(m)  is  the  set  of  non-redundant  optimal  policies. 

The  next  lemma  converts  the  requirements  for  being  non-redundant  from  being  about  the  executions  of 
an  MDP  to  being  a  local  property.  It  uses  the  definition  that  q*n(s,  a)  =  r(s,  a)  +  7Es't(S’a)(S')  *Vm(S') 
and  the  proof  uses  that  qm{cr,  s,  a)  =  r(s,  a)  +  7  a)(s0  *  s')-  ( q *  is  typically  written  as  Q*, 

but  we  reserve  Q*  for  POMDPs  in  Chapter  3.) 

Lemma  2.  For  all  NMDPs  m  and  0  in  opt(m),  0  is  in  nopt(m)  if  and  only  if  for  all  states  s  such  that 
a(s)  /  stop,  q^(s,a(s))  >  0. 

Proof.  If  Direction.  Suppose  that  for  all  s  such  that  a{s)  f  stop,  q^n(s,a(s))  >  0.  For  the  purposes 
of  showing  a  contradiction,  assume  that  0  ^  nopt(m).  Then  there  exists  a'  such  that  a’  £  opt(m) 
and  a'  A  0.  This  implies  that  there  exists  k'  and  s'  such  that  active(m(V,  A,  a'))  is  a  strict  prefix  of 
active(m(s',  k',  0)).  m(s',  k'  ,  cr')  must  have  the  form  [si,  a\,  s2, ...  ,sn,  stop, . . .]  and  m(s',  k'  .  0)  must 
have  the  form  [si,  a±,  s2, . . . ,  sn,  an, . . .]  for  some  n  where  an  /  stop.  Since  0(sn)  =  an  f  stop, 
q*m(s,  a(s))  >  0.  Since  both  0  and  a'  are  in  opt(m),  0  <  q*m(sn,  a{s))  =  q*n(sn ,  0'(s))  =  q*m(sn ,  stop)  = 
qm(cr,  sn,  stop).  However,  by  Proposition  1,  qm(a,  sn,  stop)  =  vm(a1sn)  =  0,  a  contradiction.  Thus,  our 
assumption  that  0  ^  nopt(m)  is  false  and  0  is  nopt(m). 

Only-If  Direction.  Suppose  0  is  in  nopt(m).  Consider  a  state  s  such  that  a(s)  f  stop.  Since  0  is  in 
nopt(m),  there  exists  no  a'  in  opt(m)  such  that  a'  A  0.  That  is,  there  exists  no  a'  such  that  a'  is  in  opt(m); 
for  all  contingencies  k'  consistent  with  rn,  states  s',  acti ve(m(s/,  k',  a'))  C  active(m(s/,  n' ,  0));  and  there 
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exists  a  contingency  k"  and  s"  such  that  active(m(s",  re",  o'))  IZ  active(m(s",  re",  o)).  That  is,  for  all  o', 
either  (1)  o'  is  not  in  opt(m);  (2)  it  is  not  the  case  that  for  all  contingencies  re'  consistent  with  m,  states  s', 
acti ve(m(s',  re' .  o'))  C  active(m(s',  re',  cr));  or  (3)  it  is  not  the  case  that  there  exists  a  contingency  re"  and  a 
state  s"  such  that  active(m(s",  re",  o'))  IZ  active(m(s",  re",  a)). 

We  consider  each  of  those  three  possibilities  for  o'  such  that  o'  is  equal  to  cr  except  o'(s)  =  stop. 

1.  Case:  o'  is  not  in  opt(m).  Since  o'  is  not  in  opt(m),  there  must  exist  s t  such  that  cr'(s^)  ^ 
argmaxa  q^s^,  a).  Since  o  is  in  opt(m),  for  all  s'  s,  o' (s')  =  o(s')  G  argmaxa  q^(s',  a).  Thus, 

must  be  s.  Since  o'(s)  ^  argmaxa  q^s,  a)  and  g^(s,  o'(s))  =  q^(s,  stop)  =  0,  maxa  q^s,  a)  = 
vm(s)  >  0-  Since  o  is  in  opt(m),  o(s ))  =  v^s)  >  0. 

2.  Case:  It  is  not  the  case  that  for  all  contingencies  re'  consistent  with  m,  and  for  all  states  s', 

active(m(s',  re',  o'))  C  active(m(s',  re',  o)) 

For  all  re'  and  s',  m(s',n',o)  and  m(s' ,  re' ,  o')  only  differ  if  they  reach  the  state  s  since  o  and  o' 
only  differ  at  the  state  s.  If  s  is  never  reached,  then  active(m(V.  k! .  o'))  =  acti ve(m(s/,  k',  o)).  If 
s  is  reached,  then  m(s' ,  k' ,  o')  has  the  form  [s',  aq,  S2, 02, . . . ,  s,  stop, . . .]  and  m(s',  k' ,  o)  has  the 
form  [s',  ai,  S2,  a2, ... ,  s,  o(s), .. .].  Thus,  either  way,  active(m(s',  k' ,  0'))  C  active(m(s',  k',  o)). 
Thus,  it  is  the  case  that  for  all  contingencies  k'  consistent  with  m,  states  s',  active(m(s',  n' ,  0'))  C 
active(m(s',  k' ,  o)).  Since  this  is  a  contradiction,  the  result  trivially  holds. 

3.  Case:  There  does  not  exist  a  contingency  n"  and  a  state  s"  such  that 

active(m(s",  <7'))  C  acti ve(m(s" ,  k" ,  o)) 

Let  s"  be  s.  Then  for  all  k" ,  m(s",  re",  o')  =  m(s ,  k" ,  o')  has  the  form  [s,  stop, . . .].  m(s,  n",  o)  has 
the  form  [s,cr(s), . . .]  for  some  <r(s)  ^  stop.  Thus,  there  exists  a  contingency  n"  and  s"  such  that 
active(m(s",  k" .  cr'))  C  active(m(s",  k" ,  o)).  Since  this  is  a  contradiction,  the  result  trivially  holds. 

Thus,  the  result  holds  under  all  three  possible  cases.  □ 

One  of  the  reasons  that  the  MDP  model  is  useful  is  that  an  optimal  strategy  is  guaranteed  to  exist. 
Fortunately,  we  can  prove  that  nopt(m)  is  also  guaranteed  to  be  non-empty.  One  way  to  prove  this  result 
would  use  reasoning  about  well-ordered  sets,  Proposition  2,  and  the  fact  that  space  of  all  possible  strategies 
is  finite  for  NMDPs  with  finite  state  and  action  spaces.  However,  we  provide  a  proof  that  depends  more  upon 
the  structure  of  NMDPs  since  it  can  extend  to  NMDPs  with  infinite  state  spaces,  which  becomes  important 
in  the  next  chapter. 

Theorem  1.  For  all  MDPs  m,  nopt(m)  is  not  empty. 

Proof,  opt(m)  is  non-empty  (see,  e.g.,  [RN03]).  Let  o  be  some  element  of  opt(m).  Let  o'  be  cr  ex¬ 
cept  whenever  q^(s,o(s))  <  0,  o'(s)  =  stop.  For  any  such  state  s,  q^(s,o(s))  <  0  =  vm(o',s)  = 
qm(o',  s,  o'(s ))  by  Proposition  1.  For  all  other  states  g^(s,  o'(s))  =  q^(s,  cr(s))  since  o'(s)  =  cr(s).  In  ei¬ 
ther  case,  ^(s)  =  cr(s))  since  s  is  optimal.  Thus,  for  all  states  s,  v*n(s)  <  q^(s,  o'(s)).  Thus,  o’  is  in 
opt(m).  Furthermore,  by  construction,  for  all  s,  o'(s)  /  stop  implies  that  q^(s,  o'(s ))  =  q^is,  o(s))  >  0. 
Thus,  o'  is  in  nopt(m)  by  Lemma  2.  □ 
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Figure  2.1:  The  MDP  mex i  that  the  physician  used.  Circles  represent  states,  block  arrows  denote  possible  actions, 
and  squiggly  arrows  denote  probabilistic  outcomes.  Self-loops  of  zero  reward  under  all  actions,  including  the  special 
action  stop,  are  not  shown. 


2.1.3  Example:  Modeling  the  Physician’s  Environment 

Suppose  an  auditor  is  inspecting  a  hospital  and  comes  across  a  physician  referring  a  medical  record  to  his 
own  private  practice  for  analysis  of  an  X-ray  as  described  in  Section  1.2.  As  physicians  may  only  make  such 
referrals  for  the  purpose  of  treatment  (treat),  the  auditor  may  find  the  physician’s  behavior  suspicious.  To 
investigate,  the  auditor  may  formally  model  the  hospital  using  our  formalism. 

After  studying  the  hospital  and  how  the  physician’s  actions  affect  it,  the  auditor  would  construct  the 
NMDP  mexi  =  (Aexi  ■  -4exi  •  text  ■  ^exT1  ■  Text }  shown  in  Figure  2.1.  The  figure  conveys  all  components  of 
the  NMDP  except  7exi.  For  instance,  the  block  arrow  from  the  state  si  labeled  take  and  the  squiggly  arrows 
leaving  it  denote  that  after  the  agent  performs  the  action  take  from  state  si,  the  environment  will  transition 
to  the  state  S2  with  probability  0.9  and  to  state  54  with  probability  of  0.1  (i.e.,  iexi(si,  take)(s2)  =  0-9  and 
fexi(si,take)(s4)  =  0.1).  The  number  over  the  block  arrow  further  indicates  the  degree  to  which  the  action 
satisfies  the  purpose  of  treat.  In  this  instance,  it  shows  that  r^ce1at(si!  take)  =  0.  This  transition  models  the 
physician  taking  an  X-ray.  With  probability  0.9,  he  is  able  to  make  a  diagnosis  right  away  (from  state  S2); 
with  probability  0.1,  he  must  send  the  X-ray  to  his  practice  to  make  a  diagnosis.  Similarly,  the  transition 
from  state  s4  models  that  his  practice’s  test  has  a  0.8  success  rate  of  making  a  diagnosis;  with  probability 
0.2,  no  diagnosis  is  ever  reached.  For  simplicity,  we  assume  that  all  diagnoses  have  the  same  quality  of 
12  and  that  second  opinions  do  not  improve  the  quality;  the  auditor  could  use  a  different  model  if  these 
assumptions  are  false.  (For  simplicity,  in  this  example,  we  construe  the  meaning  of  the  purpose  treatment 
very  narrowly.  An  auditor  could  construe  it  more  broadly  to  include  goals,  such  as  research,  that  improve 
treatment  in  the  long  run.) 

Using  the  model,  the  auditor  computes  opf(rg(,e1at),  which  consists  of  those  strategies  that  maximizes  the 
expected  total  discounted  degree  of  satisfaction  of  the  purpose  of  treatment  where  the  expectation  is  over  the 
probabilistic  transitions  of  the  model.  opt^r^ff1)  includes  the  appropriate  strategy  o\  where  o\  (7 )  =  take, 
ci(s4)  =  send,  07(52)  =  0-1(53)  =  07(55)  =  diagnose,  and  07(55)  =  stop.  Furthermore,  opt(rg(,e1at) 
excludes  the  redundant  strategy  0-2  that  performs  a  redundant  send  where  0-2  is  the  same  as  07  except  for 
07(52)  =  send.  Performing  the  extra  action  send  delays  the  reward  of  12  for  achieving  a  diagnosis  resulting 
in  its  discounted  reward  being  7(7 x  *  12  instead  of  7exi  *  12  and,  thus,  the  strategy  is  not  optimal. 

However,  opt(r(;(,e1at )  does  include  the  redundant  strategy  0-3  that  is  the  same  as  07  except  for  0-3(7;)  = 
send.  opt(rg(e1at)  includes  this  strategy  despite  the  send  actions  from  state  7;  being  redundant  since  no 
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positive  rewards  follow  the  send  actions.  Fortunately,  nopt(r^e1at)  does  not  include  <t3  since  a1  is  both  in 
opt(r^f )  and  07  -<  03.  To  see  that  07  -<  <73  note  that  for  every  contingency  k  and  state  s,  the  mexi(s,  k,  07) 
has  the  form  b  followed  by  an  finite  sequence  of  stop  (interleaved  with  the  state  s6)  for  some  finite  prefix  b. 
For  the  same  k,  mexi(s,  ft,  03)  has  the  form  b  followed  by  an  infinite  sequence  of  send  actions  (interleaved 
with  the  state  67)  for  the  same  b.  Thus,  mexi(s,  k,  o\  )  is  a  proper  sub-execution  of  mexi(s,  k,  <73 ). 

The  above  modeling  implies  that  the  strategy  G\  can  be  for  the  purpose  of  treatment  but  02  and  03  cannot 
be  for  treatment. 


2.2  Auditing 

In  the  above  example,  the  auditor  constructed  a  model  of  the  environment  in  which  the  auditee  operates.  The 
auditor  must  use  the  model  to  determine  if  the  auditee  obeyed  the  policy.  We  first  discuss  this  process  for 
auditing  exclusivity  policy  rules  and  revisit  the  above  example.  Then,  we  discuss  the  process  for  prohibitive 
policy  rules.  In  the  next  section,  we  provide  an  auditing  algorithm  that  automates  comparing  the  auditee’s 
behavior,  as  recorded  in  a  log,  to  the  set  of  allowed  behaviors. 

2.2.1  Auditing  Exclusivity  Rules 

Suppose  that  an  auditor  would  like  to  determine  whether  an  auditee  performed  some  logged  actions  only 
for  the  purpose  p.  The  auditor  can  compare  the  logged  behavior  to  the  behavior  that  a  hypothetical  agent 
would  perform  when  planning  for  the  purpose  p.  In  particular,  the  hypothetical  agent  selects  a  strategy  from 
nopt((<S,  A,  t,  rp,  7))  where  S,  A,  and  t  models  the  environment  of  the  auditee;  rp  is  a  reward  function 
modeling  the  degree  to  which  the  purpose  p  is  satisfied;  and  7  is  an  appropriately  selected  discounting 
factor.  If  the  logged  behavior  of  the  auditee  would  never  have  been  performed  by  the  hypothetical  agent, 
then  the  auditor  knows  that  the  auditee  violated  the  policy. 

In  particular,  the  auditor  must  consider  all  the  possible  behaviors  the  hypothetical  agent  could  have 
performed.  For  a  model  m,  let  nbehv(rp)  represent  this  set  where  a  finite  prefix  b  of  an  execution  is  in 
nbehv(rp)  if  and  only  if  there  exists  a  strategy  a  in  nopt(rp),  a  contingency  k,  and  a  state  s  such  that  6  is  a 
prefix  of  m(s,  k,  a). 

The  auditor  must  compare  nbehv(rp)  to  the  set  of  all  behaviors  that  could  have  caused  the  auditor  to 
observe  the  log  that  he  did.  We  presume  that  the  log  £  was  created  by  a  process  log  that  records  features 
of  the  current  behavior.  That  is,  log:  B  — »  L  where  B  is  the  set  of  behaviors  and  L  the  set  of  logs,  and 
l  =  log(6)  where  b  is  the  prefix  of  the  actual  execution  of  the  environment  available  at  the  time  of  auditing. 
The  auditor  must  consider  all  the  behaviors  in  log_1(T)  =  {  b  S  B  |  log (b)  =  i  }  as  possible  where  log  1  is 
the  inverse  of  the  logging  function.  In  the  best  case  for  the  auditor,  the  log  records  the  whole  prefix  b  of  the 
execution  that  transpired  until  the  time  of  auditing,  in  which  case  log~1(£)  =  { ( } .  However,  the  log  may  be 
incomplete  by  missing  actions,  or  may  include  only  partial  information  about  an  action  such  as  that  it  was 
one  of  a  set  of  actions. 

If  log  1  (£)  n  nbehv(rp)  is  empty,  then  the  auditor  may  conclude  that  the  auditee  did  not  plan  for  the 
purpose  p,  and,  thus,  violated  the  rule  that  auditee  must  only  perform  the  actions  recorded  in  l  for  the 
purpose  p\  otherwise,  the  auditor  must  consider  it  possible  that  the  auditee  planned  for  the  purpose  p. 

If  log’1  (£)  C  nbehv(rp),  the  auditor  might  be  tempted  to  conclude  that  the  auditee  surely  obeyed  the 
policy  rule.  However,  as  illustrated  in  the  second  example  below,  this  is  not  necessarily  true.  The  problem  is 
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that  log  1  (t)  might  have  a  non-empty  intersection  with  nbehv(rp  )  for  some  other  purpose  p' .  In  this  case, 
the  auditee  might  have  been  actually  planning  for  the  purpose  p'  instead  of  p.  Indeed,  given  the  likelihood 
of  such  other  purposes  for  non-trivial  scenarios,  we  consider  proving  compliance  practically  impossible. 
However,  this  incapability  is  of  little  consequence:  log  1  (£)  C  nbehv(rp)  does  imply  that  the  auditee  is 
behaving  as  though  he  is  obeying  the  policy.  That  is,  in  the  worse  case,  the  auditee  is  still  doing  the  right 
things  even  if  for  the  wrong  reasons. 

2.2.2  Example:  Auditing  the  Physician 

Below  we  revisit  the  example  of  Section  2.1.3.  We  consider  two  cases.  In  the  first,  the  auditor  shows  that 
the  physician  violated  the  policy.  In  the  second,  auditing  is  inconclusive. 

Violation  Found.  Suppose  after  constructing  the  model  as  above  in  Section  2.1.3,  the  auditor  maps  the 
actions  recorded  in  the  access  log  t\  to  the  actions  of  the  model  mex i,  and  finds  log  1  (£]  )  holds  only  a  single 
behavior:  b\  =  [si,  take,  S2,  send,  S3,  diagnose,  S6>  stop,  $6,  stop].  Next,  using  nopt(r^e1at)>  as  computed 
above,  the  auditor  constructs  the  set  nbehv(r^e1at)  of  all  behaviors  an  agent  planning  for  treatment  might 
exhibit.  The  auditor  would  find  that  b\  is  not  in  nbehv(r^e1at)- 

To  see  this,  note  that  every  execution  e\  that  has  61  as  a  prefix  is  generated  from  a  strategy  a  such 
that  a(s2)  =  send.  The  strategy  <72  from  Section  2.1.3  is  one  such  strategy.  None  of  these  strategies  are 
members  of  opt(r^e1at)  for  the  same  reason  as  <72  is  not  a  member.  Thus,  b\  cannot  be  in  nbehv(r^e1at).  As 
log  1  (£)  n  nbehv(r^e1at )  is  empty,  the  audit  reveals  that  the  physician  violated  the  policy. 

Inconclusive.  Now  suppose  that  the  auditor  sees  a  different  log  £ 2  such  that  log  ]  (£2  )  =  { fo }  where  62  = 
[si,  take,  S4,  send,  55,  diagnose,  S6,  stop,  S6,  stop].  In  this  case,  our  formalism  would  not  find  a  violation 
since  62  is  in  nbehv(rg]<e1at)-  In  particular,  the  strategy  04  from  above  produces  the  behavior  62  under  the 
contingency  that  selects  the  bottom  probabilistic  transition  from  state  si  to  state  *4  under  the  action  take. 

Nevertheless,  the  auditor  cannot  be  sure  that  the  physician  obeyed  the  policy.  For  example,  consider  the 
NMDP  m'exl  that  is  mexi  altered  to  use  the  reward  function  r^°flt  instead  of  .  r^°flt  assigns  a  reward 
of  zero  to  all  transitions  except  for  the  send  actions  from  states  S2  and  54,  to  which  it  assigns  a  reward  of  9. 
fj 4  is  in  nopt(r^°flt)  meaning  that  not  only  the  same  actions  (those  in  62),  but  even  the  exact  same  strategy 
can  be  either  for  the  allowed  purpose  treat  or  the  disallowed  purpose  profit.  Thus,  if  the  physician  did  refer 
the  record  to  his  practice  for  profit,  he  cannot  be  caught  as  he  has  tenable  deniability  of  his  ulterior  motive 
of  profit. 

2.2.3  Auditing  Prohibitive  Rules 

In  the  above  example,  the  auditor  was  enforcing  the  rule  that  the  physician’s  actions  be  only  for  treatment. 
Now,  consider  auditing  to  enforce  the  rule  that  the  physician’s  actions  are  not  for  personal  profit.  To  obey 
this  purpose  restriction,  the  auditee  need  not  have  attempted  to  minimize  the  degree  of  satisfaction  of  the 
purpose.  Rather  the  auditee,  need  merely  to  have  ignored  the  prohibited  purpose. 

To  audit  for  compliance  with  a  rule  prohibiting  the  purpose  p,  after  seeing  the  log  l,  the  auditor  could 
check  whether  log1  (/:')  n  nbehv(rp)  is  empty.  If  so,  then  the  auditor  knows  that  the  policy  was  obeyed 
because  the  auditee  could  not  have  been  planning  for  the  purpose  p.  If  not,  then  the  auditor  cannot  prove 
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AUDlTNMDP(m  =  (S,A,t,r,  7),  b  =  [si,  a\,  s2l  a2, . . .  ,sn,an]): 

01  if  (lMPOSSIBLEMDP(m,  b)) 

02  return  true  //  behavior  impossible  for  NMDP 
03  v^:=  SOLVEMDP(m) 

04  for  ( i  :=  1;  i  <  n;  i++): 

05  if  (q*(m,  v*m,  su  af)  <  <4  (>*)): 

06  return  true  //  action  suboptimal 

07  if  (q*(m,  v^,  Sj,  af  <  0  and  a*  /  stop): 

08  return  true  //  action  redundant 

09  return  false 

Figure  2.2:  The  algorithm  AuditNMDP.  SolveMDP  may  be  any  MDP  solving  algorithm.  Figure  2.3  shows 
ImpossibleMDP. 


nor  disprove  a  violation.  In  the  above  example,  just  as  the  auditor  is  unsure  whether  the  actions  were  for  the 
required  purpose  of  treatment,  the  auditor  is  unsure  whether  the  actions  are  not  for  the  prohibited  purpose 
of  profit. 

An  auditor  might  decide  to  investigate  some  of  the  cases  where  log~1(£)  n  nbehv(rp)  is  not  empty.  In 
this  case,  the  auditor  could  limit  his  attention  to  only  those  possible  violations  of  a  prohibitive  rule  that 
cannot  be  explained  away  by  some  allowed  purpose.  For  example,  in  the  inconclusive  example  above,  the 
physician’s  actions  can  be  explained  with  the  allowed  purpose  of  treatment.  As  the  physician  has  tenable 
deniability,  it  is  unlikely  that  investigating  his  actions  would  be  a  productive  use  of  the  auditor’s  time.  Thus, 
the  auditor  should  limit  his  attention  to  those  logs  l  such  that  both  log_1(^)  n  nbehv(r^°flt)  is  non-empty 
and  log^1^)  n  nbehv(r^e1at)  is  empty. 

A  similar  additional  check  using  disallowed  purposes  could  be  applied  to  enforcing  exclusivity  rules. 
However,  for  exclusivity  rules,  this  check  would  identify  cases  where  the  auditee’s  behavior  could  have 
been  either  for  the  allowed  purpose  or  a  disallowed  purpose.  Thus,  it  would  serve  to  find  additional  cases  to 
investigate  and  increase  the  auditor’s  workload  rather  than  reduce  it.  Furthermore,  the  auditee  would  have 
tenable  deniability  for  these  possible  ulterior  motives,  making  these  investigations  a  poor  use  of  the  auditor’s 
time. 


2.3  Auditing  Algorithm 

We  would  like  to  automate  the  auditing  process  described  above.  To  this  end,  we  present  in  Figure  2.2  an 
algorithm  AuditNMDP  that  aids  the  auditor  in  comparing  the  log  to  the  set  of  allowed  behaviors.  The 
algorithm  is  closely  related  to  a  goal  inference  algorithm  that  use  MDPs  [BTS06,  BST09],  but  our  algorithm 
focuses  on  soundness  rather  than  predictive  ability.  (See  Section  7.4  for  a  more  detailed  discussion.) 

Since  we  are  not  interested  in  the  details  of  the  logging  process  and  would  like  to  focus  on  the  planning 
aspects  of  our  semantics,  we  limit  our  attention  to  the  case  where  log(fo)  =  b  (i.e.,  the  log  is  simply  the 
behavior  of  the  auditee).  However,  future  work  could  extend  our  algorithm  to  handle  incomplete  logs  by 
constructing  the  set  of  all  possible  behaviors  that  could  give  rise  to  that  log. 

The  algorithm  presumes  that  the  MDP  m  is  finite.  That  is,  both  S  and  A  are  finite.  As  proved  below 
(Theorem  2),  AuditNMDP (m,  b)  returns  true  if  and  only  if  log“1(6)  n  nbehv(m)  is  empty.  In  the  case  of 
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ImpossibleMDP  (m  =  ( S ,  „4,  f,  r,  7),  b  =  [si,  ai,  S2,  a^,  ■  ■  ■ ,  sn,  an])\ 

11  for  (i  :=  1;  i  <  n;  i++): 

12  if  (sj  ^  sy. 

13  return  true  //  Sj  is  not  a  state 

14  if  (a;  ^  .4): 

1 5  return  true  //  a,  is  not  an  action 

16  for  ( i  :=  1;  i  <  n;  z++): 

17  if  (f(sj,ai)(si+i)  <  0): 

18  return  true  //  unreachable  from  Sj 

19  for  (j  :=i  +  l;j<  n;  j++): 

20  if  (sj  =  Sj  and  at  /  aj)\ 

21  return  true  //  no  stationary  strategy  could  have  produced  the  behavior 

22  return  false 

Figure  2.3:  The  algorithm  ImpossibleMDP.  Returns  whether  the  given  behavior  is  possible  for  the  given  MDP. 


an  exclusivity  rule,  the  auditor  may  conclude  that  the  policy  was  violated  when  AuditNMDP  returns  true. 
In  case  of  a  prohibitive  rule,  the  auditor  may  conclude  the  policy  was  obeyed  when  AuditNMDP  returns 
true. 

The  algorithm  operates  by  checking  a  series  of  local  conditions  of  the  NMDP  m  and  behavior  b  that  are 
equivalent  to  the  global  property  of  whether  log-1  (b)  n  nbehv(m)  is  empty  (as  proved  by  Lemma  3).  First, 
AuditNMDP  checks  whether  the  behavior  b  is  possible  for  m  using  the  sub-routine  ImpossibleMDP 
shown  in  Figure  2.3.  ImpossibleMDP  checks  whether  every  state  and  action  is  valid  (Lines  12  and  14), 
every  state  is  reachable  by  the  state  proceeding  it  (Line  17),  and  that  the  same  action  is  performed  from 
equal  states  in  b  (Line  20). 

Next,  the  AuditNMDP  checks  whether  the  behavior  b  is  optimal  (Line  05)  and  non-redundant  (Line  07). 
To  do  so,  AuditNMDP  uses  a  sub-routine  SolveMDP  to  compute  v*n,  which  for  each  state  s  records 
n^(s),  the  optimal  value  for  s.  The  fact  that  NMDPs  are  a  type  of  MDP  allows  AuditNMDP  to  use 
any  MDP  optimization  algorithm  for  SolveMDP,  such  as  reducing  the  optimization  to  a  system  of  linear 
equations  [d’E63]. 

AuditNMDP  uses  a  function  q*  that  computes  q*  from  v*: 

q *(rn,  v*m ,  s,  a)  =  r{si,  a*)  +  7  ^  t(s*,  a*) (s')  *  v*m (s') 

s'es 


Thus,  q *(m,  v^,  s,  a)  is  equal  to  q^(s,  a). 

The  essence  of  the  algorithm  is  checking  whether  log  ‘  1  (()  n  nbehv(m)  is  empty.  For  simplicity,  our 
algorithm  presumes  that  log  l(£)  holds  only  one  behavior.  This  restriction  manifests  itself  in  that  each  of 
the  local  checks  (Lines  01,  05,  and  07)  only  considers  a  single  sequence  of  states  and  actions. 

If  log"  1  (£)  holds  more  than  a  single  behavior  but  is  a  small  set,  then  the  auditor  may  run  the  algorithm 
for  each  behavior  in  log"  1  (£).  Alternatively,  in  some  cases  the  set  log  1  (1)  may  have  structure  that  a 
modified  algorithm  could  leverage.  For  example,  if  log  1  (£)  is  missing  what  action  is  taken  at  some  states 
of  the  execution  or  only  narrows  down  the  taken  action  to  a  set  of  possible  alternatives,  a  conjunction  of 
constraints  on  the  action  taken  at  each  state  may  identify  the  set.  Furthermore,  if  the  log  only  records  some 
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of  the  states  reached  by  the  auditee,  the  algorithm  ImpossibleMDP  could  be  changed  to  allow  from  such 
discontinuities. 

2.3.1  Correctness 

To  prove  correctness,  we  use  the  following  lemma  that  allows  us  to  reduce  checking  for  violations  to  local 
properties  of  the  NMDP  and  the  auditee’s  behavior. 

Lemma  3.  For  an  NMDP  m,  the  behavior  h  =  [s  \ ,  a  \ . . . .  ,sn,an]  is  in  nbehv(m)  if  and  only  if  b  is  a 
possible  behavior  ofm,  and  for  all  i  <  n,  q^isii  ai )  =  vm(si)  anc^  ai  /  stoP  implies  that  q^Si,  )  >  0. 

Proof  First,  for  the  only-if  direction,  suppose  b  £  nbehv(m).  Since  b  is  in  nbehv(m).  there  exists  a  state  s, 
a  contingency  k  consistent  with  m,  and  strategy  cr  in  nopt(m)  such  that  b  IZ  mis.  k.  a).  Thus,  b  is  possible 
since  k  is  consistent  with  with  m.  Since  b  C  m(s,  k.  a),  for  all  i  <  n,  a(si)  =  ai.  Since  a  is  in  nopt(m), 
for  all  i  <  n,  q^Si,  ai)  =  Qm(si’  a(si))  =  vm(si )•  Since  a  is  in  nopt(m),  by  Lemma  2,  for  all  s  such  that 
a(s)  f  stop,  qm(s,  a(s ))  >  0.  Thus,  for  all  i  <  n,  a(si)  f  stop,  cr(s))  >  0. 

Second,  for  the  if  direction,  suppose  b  is  a  possible  behavior  of  m,  and  for  all  i  <  n,  q*n(si,  at )  =  v*n{si) 
and  ai  f  stop  implies  that  q*n(si,  a*)  >  0.  By  Theorem  1,  nopt(m)  is  not  empty.  Let  a  be  some  element 
of  nopt(m).  Let  a'  be  identical  to  a  except  for  all  i,  afsf)  =  ai,  which  is  well  defined  since  b  is  possible. 
For  all  i,  q*m(si,(j' (si))  =  q*m(si,ai)  =  v*m{si)  and  a'(si)  =  an  f  stop  implies  that  q*m{si,a\si))  = 
q*fsi,ai )  >  0.  For  all  other  states  s,  q^Si,  cr'  (s))  =  q^n(s,cr(s))  =  v^s)  and  a'(s )  f  stop  implies  that 
q*1(s,<j'(s))  >  0  by  Lemma  2  since  a'(s)  =  a(s).  Thus,  for  all  s,  a'(s))  =  v^(s),  which  implies 
that  a'  is  in  opt(m).  Furthermore,  for  all  s,  a'(s)  f  stop  implies  that  q*n(s,  a'(s ))  >  0,  which  implies  that 
a1  is  in  nopt(m)  by  Lemma  2. 

By  Lemma  1,  b  being  possible  implies  that  for  all  i  <  n,  t(si,  aj)(sj+i)  >  0.  Thus,  there  exists  a 
contingency  k  that  is  consistent  with  m  such  that  at,  i)  =  Sj+i.  Furthermore,  b  C  rn{s,  n,  a')  for 
s  =  s i-  Thus,  since  a'  is  in  nopt(m),  b  is  in  nbehv(m).  □ 

The  above  lemma  combines  with  reasoning  about  the  actual  code  of  the  program  to  yield  its  correctness. 
First,  we  prove  the  correctness  of  ImpossibleMDP  as  a  lemma. 

Lemma  4.  For  all  MDPs  rn  and  behaviors  b,  IMPOSSIBLEMDP  (m,  b )  is  a  decision  procedure  for  whether 
b  is  not  a  possible  behavior  ofm. 

Proof.  To  show  that  ImpossibleMDP  is  a  decision  procedure,  we  must  show  that  it  always  terminates, 
that  b  is  not  possible  for  m  if  and  only  if  ImpossibleMDP  (m,  b)  returns  true,  and  that  b  is  possible  for  m 
if  and  only  if  ImpossibleMDP  (m,  b)  returns  false. 

To  show  that  ImpossibleMDP  terminates  note  that  all  the  for  loops  involve  a  monotonically  increasing 
counter  ( i  or  j )  and  that  they  all  terminate  after  the  counter  reaches  finite  number  (n  or  n  +  1). 

ImpossibleMDP  returns  true  if  and  only  if  one  of  the  following  is  true:  (1)  there  exists  i  <  n  such 
that  Si  is  not  a  state  of  m,  (2)  there  exists  i  <  n  such  that  ai  is  not  an  action  of  m,  (3)  there  exists  i  <  n 
such  that  t(si,  aj)(sj+ 1)  <  0,  (4)  there  exists  i  <  n  and  j  where  i  <  j  <  n  such  that  s,  =  Sj  and  ai  aj. 
ImpossibleMDP  returns  false  if  and  only  if  all  of  the  conditions  (1),  (2),  (3),  and  (4)  are  false.  Conditions 
(1)  and  (2)  are  both  false  if  and  only  if  b  is  in  (5  x  A)*.  Condition  (3)  is  false  if  and  only  if  for  all  i  <  n, 
t{si,  ai){si+i)  >  0.  Condition  (4)  is  false  if  and  only  if  al  li  <  n  and  j  <  n,  Si  =  Sj  implies  that  a,  =  aj. 
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Thus,  by  Lemma  1,  6  is  possible  for  m  if  and  only  if  the  conditions  (1),  (2),  (3),  and  (4)  are  all  false, 
which  is  exactly  when  ImpossibleMDP  returns  false.  Furthermore,  b  is  not  possible  for  rn  if  and  only  if 
one  of  the  conditions  (1),  (2),  (3),  and  (4)  is  true,  which  is  exactly  when  ImpossibleMDP  returns  true.  □ 

Theorem  2.  For  all  finite  NMDPs  m  and  behaviors  b,  AuditNMDP  is  a  decision  procedure  for  whether 
log”1  (6)  n  nbehv(m)  is  empty. 

Proof.  To  show  that  AuditNMDP  is  a  decision  procedure,  we  must  show  that  it  always  terminates,  that 
log_1(&)nnbehv(m)  is  empty  if  and  only  if  AUDlTNMDP(m,  b)  returns  true,  and  that  log_1(6)nnbehv(m) 
is  non-empty  if  and  only  if  AuditNMDP  (m,  b)  returns  false. 

To  show  that  AuditNMDP  terminates,  note  that  SolveMDP  is  also  guaranteed  to  terminate  because 
rn  is  finite.  Thus,  each  iteration  of  the  for  loop  terminates.  Furthermore,  n  is  a  finite  number  and  i  mono- 
tonically  increases  toward  it.  Thus,  the  loop  will  execute  only  a  finite  number  of  times.  Furthermore  q*  will 
terminate  since  S  is  finite. 

Now,  we  show  that  log_1(6)  n  nbehv(m)  is  empty  if  and  only  if  AuditNMDP (m,  b)  returns  true. 
AuditNMDP (m,  b )  returns  true  if  and  only  if  at  least  one  of  the  following  is  true:  (1)  b  is  not  possible 
(see  Lemma  4),  (2)  there  exists  i  <  n  such  that  q’(m,  vf,  st.  af)  <  v*n(si),  (3)  there  exists  i  <  n  such 
that  q*(m,  vf ,  Si,  af}  <  0  and  a,  f  stop.  At  least  one  of  the  Conditions  (1),  (2),  or  (3)  is  true  if  and  only 
if  the  following  is  false:  b  is  a  possible  behavior  of  m,  for  all  i  <  n,  af)  =  v*n(sf)  and  a,  f  stop 

implies  that  af)  >  0.  Thus,  by  Lemma  3,  AUDlTNMDP(m,  b )  returns  true  if  and  only  if  b  is  not  in 

nbehv(m).  Since  log_1(6)  =  {6},  AuditNMDP  (m,  b)  returns  true  if  and  only  if  log~1(6)  n  nbehv(m)  is 
empty. 

Since  AuditNMDP (m,  b )  always  terminates  and  can  only  return  true  or  false,  and  returns  true  if  and 
only  if  log”1  (6)  n  nbehv(m)  is  empty,  AUDlTNMDP(m,  b )  returns  false  if  and  only  if  log”1  (6)  n  nbehv(m) 
is  non-empty.  □ 

2.3.2  Running  Time 

The  running  time  of  the  algorithm  is  dominated  by  the  MDP  optimization  conducted  by  SolveMDP. 
SolveMDP  may  be  done  exactly  by  reducing  the  optimization  to  a  system  of  linear  equations  [d’E63]. 
Such  systems  may  be  solved  in  polynomial  time  [Kha79,  Kar84].  However,  in  practice,  large  systems  are 
often  difficult  to  solve.  Fortunately,  a  large  number  of  algorithms  for  making  iterative  approximations  exist 
whose  running  time  depends  on  the  quality  of  the  approximation.  (See  [LDK95]  for  a  discussion.)  In  the 
next  section,  we  discuss  an  implementation  using  such  a  technique. 

2.4  Approximation  Algorithm  and  Implementation 

Rather  than  implement  the  exact  algorithm  AuditNMDP  found  in  Section  2.3,  we  implemented  an  ap¬ 
proximation  algorithm  using  the  standard  value  iteration  algorithm  to  solve  MDPs  (see,  e.g.,  [RN03]).  The 
value  iteration  algorithm  starts  with  an  arbitrary  guess  of  an  optimal  strategy  for  an  MDP  and  the  value 
of  each  state  under  that  policy.  With  each  iteration,  the  algorithm  improves  its  estimation  of  the  optimal 
strategy  and  its  value.  It  continues  to  make  successively  more  accurate  estimations  until  the  improvement 
between  one  iteration  and  next  is  below  some  threshold  e.  At  this  point,  the  algorithm  returns  its  estima¬ 
tions.  The  difference  between  its  estimation  of  the  value  of  each  state  under  the  optimal  policy  and  the 
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AuditNMDPapprox  (m  =  ( S ,  A,  t,  r,  7),  b  =  [si,  ai,  S2> 02, . . .  ,sn,  an})\ 

21  if  (IMPOSSIBLEMDP  (m,  b)) 

22  return  true  //  behavior  impossible  for  NMDP 

23  (vfow,v*p)  :=  SolveMDPapprox  (m) 

24  for  (i  :=  1;  i  <  n;  z++): 

25  if  (q *(m,  v*p,  st,  a*)  <  vfow(sj)): 

26  return  true  //  action  suboptimal 

27  if  (q*(m,  v*p,  .7,  a*)  <  0  and  a*  stop): 

28  return  true  //  action  redundant 

29  return  false 

Figure  2.4:  The  algorithm  AuditNMDPapprox.  SolveMDPapprox  is  an  MDP  approximation  algorithm.  Fig¬ 
ure  2.3  shows  ImpossibleMDP. 


true  value  is  bounded  by  2^7/ ( 1  —  7)  where  7  is  the  discount  factor  of  the  MDP  [WB93,  WB94].  Each 
iteration  takes  0(|5|2  *  \A\)  time.  The  number  of  iterations  needed  to  reach  convergence  grows  quickly  in 
7  making  the  algorithm  pseudo-polynomial  time  in  7  and  polynomial  time  in  \A\  and  \S\  [Tse90].  Despite 
the  linear  programming  approach  having  better  worst-case  complexity,  value  iteration  tends  to  perform  well 
in  practice.  Using  value  iteration  in  our  algorithm  results  in  it  having  the  same  asymptotic  running  time  of 
pseudo-polynomial  in  7. 

To  maintain  soundness,  the  approximate  auditing  algorithm  differs  from  the  exact  algorithm  to  account 
for  the  approximations  made  by  the  value-iteration  algorithm.  Figure  2.4  shows  a  general  framework  for 
auditing  with  approximation  algorithms.  SolveMDPapprox  is  an  approximation  algorithm  for  solving 
MDPs.  It  returns  lower  and  upper  bounds  on  the  value  of  ^(s,  a)  for  each  s  and  a.  AuditNMDPapprox 
uses  these  bounds  to  soundly  audit. 

For  example,  the  auditor  may  select  to  use  value  iteration  for  SolveMDPapprox.  In  this  case, 
vL/(s>a)  =  vaPP0>a)  -  2e7/(l  -  7)  and  v*p(s,a)  =  v*pp(s,a)  +  2e7/(l  -  7)  where  v*pp(s,a)  is 
the  value  of  the  approximation  returned  by  value  iteration  using  e  for  the  accuracy  parameter. 

With  these  changes,  the  approximation  algorithm  is  sound  in  that  it  will  return  true  only  when  the 
original  algorithm  AuditNMDP  solving  the  MDPs  exactly  would  return  true. 

Theorem  3.  For  all  finite  NMDPs  m  and  behaviors  b,  if  AUDITNMDPAPPROX  (m,  b)  returns  true,  then 
log“1(6)  n  nbehv(m)  is  empty. 

Proof.  If  AuditNMDPapprox  (to,  b)  returns  true,  then  one  of  the  following  is  true:  (1)  ImpossibleMDP 
returns  true,  (2)  there  exists  i  <  n  such  that  q*(m,  v*  s*,  a^)  <  Vlow(S  i),  or  (3)  there  exists  i  <  n  such  that 
q*(m,  v*p,  Si,  ai)  <  0  and  ai  stop.  If  (1)  is  true,  then  b  is  not  a  possible  behavior  of  m  by  Femma  4.  If 
(2)  is  true,  then  for  that  i,  af)  7^  v*m (sf)  since  q*m{si,  at)  <  q*(m,  v*p,  si:  a*)  <  v|fow(si)  <  v*m (sj). 

If  (3)  is  true,  then  for  that  i,  ai  fi  stop  does  not  imply  that  q*ri  (si,  a*)  >  0  since  a*  f  stop  and  (/*,  (*',;.  a,)  < 
q*(m,  v*p,  Si,ai )  <  0.  Thus,  under  each  of  these  cases,  Femma  3  shows  that  b  =  [si,  a\,  S2,  a^, ... ,  sn,  an ] 
is  not  in  nbehv(m).  This  fact  implies  that  log_1(6)  n  nbehv(m)  is  empty  since  log_1(6)  =  {b}.  □ 

AuditNMDPapprox  is  not  complete:  it  may  return  false  in  cases  where  AuditNMDP  would  return 
true.  These  additional  results  of  false  mean  that  additional  violations  of  exclusivity  rules  might  go  uncaught 
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and  additional  compliance  with  prohibitive  rules  might  go  unproven.  However,  since  false  indicates  an 
inconclusive  audit,  they  do  not  alter  soundness  of  the  implementation. 

When  AuditNMDPapprox  returns  false,  the  auditor  may  use  a  more  accurate  approximation  algo¬ 
rithm  for  SolveMDPapprox  in  hopes  that  improving  accuracy  of  the  approximations  will  produce  the  con¬ 
clusive  response  of  true.  For  the  value  iteration  algorithm,  the  auditor  just  needs  to  rerun  the  algorithm  with 
a  lower  value  for  e.  There  always  exists  a  value  of  e  small  enough  to  show  that  q *(m,  v*p,  st .  an )  <  V|ow(.S't) 
when  it  is  actually  the  case  that  a*)  <  ?'*,(«,;).  However,  when  qffs,,  an )  =  0,  there  will  be  no  value 

of  e  small  enough  to  make  q*(m,  v*p,  Si,  ai)  <  0  true.  Thus,  AuditNMDPapprox  using  value  iteration 
will  never  catch  when  log  1  (b)  n  nbehv(m)  is  empty  because  an  action  of  b  is  redundant  but  otherwise 
optimal  (u^(sj)  =  ai)  =  0  but  ai  f  stop  for  some  af). 

We  programmed  our  implementation  in  the  Racket  dialect  of  Scheme  [FP10].  The  implementation  is 
available  at: 

http : / /www . cs . emu . edu/ ~mts chant /thesis/ 

The  implementation  uses  an  explicit  representation  of  the  state  and  actions  spaces.  The  transition  and 
reward  functions  are  represented  using  hash  maps.  Since  we  did  not  optimize  the  implementation,  we  did 
not  benchmark  its  performance.  However,  in  Section  4.5,  we  use  it  to  aid  understanding  a  complex  example 
and  report  its  performance  in  that  section. 
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Chapter  3 


Information  Use  for  a  Purpose 


3.1  Actions  Using  Information 

In  Chapter  2,  we  saw  how  to  formalize  performing  an  action  for  a  purpose.  In  this  chapter,  we  consider  how 
to  formalize  using  information  for  a  purpose.  Before  providing  an  overview  of  this  chapter  in  Section  3. 1 .3, 
we  present  two  examples  that  motivate  our  approach.  The  first  example  (Section  3.1.1)  illustrates  the  diffi¬ 
culties  of  modeling  information  use  as  an  action  in  the  sense  of  Chapter  2  even  when  the  information  used  is 
easily  mapped  to  specific  actions.  After  finding  this  approach  unsatisfactory,  we  introduce  Partially  Observ¬ 
able  Markov  Decision  Processes  (POMDPs)  to  provide  a  formalization  of  information  use.  In  the  second 
example  (Section  3.1.2),  we  show  that  POMDPs  are  sufficient  even  when  the  information  used  cannot  be 
directly  mapped  to  actions.  Since  this  section  is  motivational,  we  defer  all  formalism  to  later  sections. 

3.1.1  Physician  Example:  Parametric  Information  Use 

An  auditee  may  perform  actions  that  manipulate,  save,  transmit,  collect,  or  otherwise  involve  information 
governed  by  a  privacy  policy.  Consider  the  example  in  Section  2.1.3.  Here  the  action  send  involves  trans¬ 
mitting  the  information  conveyed  or  revealed  by  the  X-ray.  Thus,  the  formalism  presented  in  Chapter  2  may 
already  deal  with  actions  involving  information.  The  auditor  may  use  this  formalism  to  determine  that  an 
auditee  used  information  for  a  purpose  by  determining  that  the  auditee  performed  an  action  involving  that 
information  for  that  purpose. 

However,  the  involvement  of  information  is  not  explicit.  Rather,  the  auditor  must  track  information 
with  means  outside  the  formal  model.  For  example,  the  auditor  might  informally  determine  which  actions 
involve  information  and  supply  suggestive  names  to  identify  them  such  as  send.  While  the  auditor  may 
understand  from  context  that  the  action  send  involves  the  information  in  the  X-ray,  the  auditor  may  desire  a 
more  detailed  model  that  makes  this  information  usage  explicit.  Furthermore,  when  the  usage  of  information 
by  a  system  is  unclear,  the  auditor  needs  a  model  of  the  system  to  determine  its  information  usage. 

Such  a  model  could  be  constructed  by  modeling  the  different  values  that  the  X-ray  could  take  on  and 
modeling  each  as  an  input  such  as  in  work  on  noninterference  [GM82].  However,  this  model  would  be 
disconnected  from  the  formal  models  presented  in  the  previous  chapter:  the  MDP  formalism  is  incompatible 
with  the  nondeterminism  used  in  these  models  for  inputs  since  such  nondeterminism  makes  evaluating  a 
policy  impossible.  Nevertheless,  we  may  adapt  such  a  model  to  the  MDP  setting  by  instead  encoding 
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Figure  3.1:  MDP  making  the  involvement  of  information  explicit  for  the  physician  example.  This  figure  uses  the 
conventions  as  earlier  figures  of  MDPs,  such  as  Figure  2. 1 . 


these  inputs  in  the  probabilistic  transitions  of  the  model.  Such  an  encoding  requires  assigning  a  probability 
distribution  over  the  possible  inputs. 

Figure  3.1  shows  such  a  model.  In  this  example,  from  the  point  of  view  of  the  physician  deciding  whether 
to  send  the  X-ray  to  the  specialist,  only  two  classes  of  X-ray  exist:  X-rays  from  which  the  physician  himself 
can  form  a  diagnosis  and  those  from  which  he  cannot.  Thus,  as  a  simplification,  in  our  model,  we  employ 
only  two  different  X-rays.  (Alternatively,  one  can  view  these  two  X-rays  as  two  classes  of  X-rays  where  an 
X-ray  is  put  into  one  or  the  other  class  depending  upon  whether  the  physician  can  form  a  diagnosis  from 
the  X-ray.)  The  physician  can  form  a  diagnosis  from  the  X-ray  x\,  but  not  x^.  To  see  this  difference  in  the 
model,  note  that,  from  state  si,  the  action  diagnose  leads  to  a  reward  of  12  and  to  a  new  state  S5,  whereas, 
from  the  state  S2,  the  action  diagnosis  is  a  self-loop  of  zero  reward  (which  is  inferred  from  the  absence  of 
the  action  labeling  any  arrow  leaving  the  state  s 2  in  the  figure).  The  single  send  action  is  replaced  with  one 
for  each  value  of  the  X-ray. 

Despite  the  X-rays  x,\  and  x-i  intuitively  corresponding  to  inputs  to  the  physician  or  observations  made 
by  the  physician,  the  physician  becoming  aware  of  which  of  the  two  possible  X-rays  he  has  taken  is  not 
explicitly  represented  in  the  model  as  an  input  or  an  action.  Rather,  these  inputs  determine  which  of  the 
states  si  or  s 2  the  physician  reaches.  For  example,  input  x\  results  in  the  transition  to  the  state  si.  Since  the 
physician  may  observe  what  state  he  is  in,  he  may  learn  the  value  of  the  input  (i.e.,  which  X-ray  he  took).  In 
this  simple  model,  we  assume  that  the  physician  can  only  send  that  X-ray.  In  some  systems,  the  physician 
might  have  a  choice  of  which  X-ray  to  send. 

Consider  a  physician  who  plans  to  send  the  X-ray  to  a  specialist  if  and  only  if  it  is  X2 ■  Using  a  formal¬ 
ization  similar  to  noninterference  [GM82],  we  can  determine  that  his  plan  uses  the  X-ray.  However,  such 
a  formalism  would  be  unnatural  as  it  relies  on  modeling  inputs,  which  do  not  show  up  in  the  MDP  model. 
Thus,  we  instead  switch  to  the  formalism  of  Partially  Observable  Markov  Decision  Processes  (POMDPs). 
Under  the  POMDP  model,  the  agent  using  the  model  to  plan  does  not  know  a  priori  what  state  it  is  in.  Rather 
the  agent  makes  obsen’ations  that  adjusts  its  beliefs  about  its  current  state.  The  input  of  the  X-ray  would 
be  modeled  as  such  an  observation.  Furthermore,  rather  than  modeling  the  uncertainty  over  which  value 
of  the  X-ray  will  be  produced  as  a  probabilistic  transition,  a  POMDP  model  represents  such  uncertainty  as 
uncertainty  over  the  initial  state  of  the  system. 

Figure  3.2  shows  a  POMDP  model  of  the  above  example.  Unlike  the  MDP  model  (shown  in  Figure  3.1), 
the  POMDP  model  explicitly  represents  the  physician  receiving  the  X-ray  as  an  observation.  The  obser- 
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Figure  3.2:  POMDP  mp hy  making  the  involvement  of  information  explicit  for  the  physician  example.  This  figure  is 
suggestive  of  the  POMDP  model  of  the  physician  example.  While  focusing  on  the  key  features  of  the  POMDP,  it  not 
a  complete  representation.  See  Section  3.2.2  for  the  complete  description  of  the  POMDP  mphy  and  further  discussion 
of  the  figure. 

The  figure  follows  the  conventions  of  our  figures  showing  MDPs:  Circles  represent  states,  block  arrows  denote  possible 
actions,  and  squiggly  arrows  denote  probabilistic  outcomes  including  their  probability  (under  the  arrow);  self-loops  of 
zero  reward  are  not  shown.  Additionally,  each  probabilistic  outcome  (squiggly  arrow)  is  labeled  with  the  observation 
that  accompanies  it  (this  convention  does  not  generalize  well  for  more  complex  POMDPs).  To  the  left  of  the  possible 
initial  states  si  and  S2  are  the  probabilities  of  the  physician  starting  in  each  of  these  states.  Strictly  speaking,  these 
probabilities  are  not  part  of  the  POMDP  model,  but  rather  part  of  physician’s  initial  beliefs  about  the  state  of  the 
POMDP.  However,  we  include  these  beliefs  in  this  figure  since,  in  this  example,  the  physician’s  initial  beliefs  are 
known  to  the  auditor. 


vations  x\  and  x'2  that  label  the  probabilistic  transitions  following  the  action  take,  which  models  taking 
the  X-ray,  represent  which  value  of  the  X-ray  the  physician  observes.  The  value  seen  by  the  physician  is 
determined  by  which  of  the  possible  initial  states  is  the  actual  initial  state  of  the  environment.  Intuitively, 
each  initial  state  represents  a  different  condition  the  patient  may  have  that  will  affect  the  value  of  the  X-ray. 
As  the  physician  does  not  know  a  priori  what  state  it  is  in,  he  must  take  the  X-ray  to  determine  which  of 
these  conditions  afflicts  the  patient.  Note  that  the  dummy  observation  o  labels  transitions  that  produce  no 
additional  information  other  than  that  a  transition  occurred. 

Intuitively,  if  the  physician  were  to  ignore  whether  he  makes  the  observation  x\  or  x-i  after  taking  the 
X-ray,  then  he  will  not  learn  which  of  the  two  possible  conditions  afflicts  the  patient.  Furthermore,  he  will 
not  learn  whether  he  can  make  a  diagnosis  from  the  X-ray.  To  compensate,  the  physician  will  have  to  always 
send  the  X-ray  to  the  specialist  to  ensure  a  diagnosis.  This  difference  in  behavior  may  enable  an  auditor  to 
learn  whether  the  physician  used  X-ray. 

Up  until  now,  we  have  only  seen  information  affecting  the  action  chosen  by  the  physician  parametrically. 
That  is,  after  the  physician  observes  x^,  it  chooses  the  action  send.x'2,  an  action  labeled  with  the  used 
information.  This  might  lead  an  auditor  to  conclude  that  only  actions  labeled  by  a  piece  of  information 
(such  as  X2  labeling  send.7,'2)  involves  information.  However,  the  parametric  view  of  information  use  may 
be  generalized.  In  some  cases,  the  agent  may  use  information  to  choose  among  completely  different  courses 
of  action.  For  example,  upon  seeing  a  certain  value  for  the  X-ray,  the  physician  may  decide  to  perform 
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emergency  surgery  rather  than  sending  the  X-ray  for  further  analysis. 

This  distinction  is  similar  to  one  made  in  works  on  checking  programs  for  noninterference:  the  first  ex¬ 
ample  shown  in  Figure  3.2  (parametric)  suggests  direct  information  flow  or  a  data  dependence  in  which 
a  variable  takes  on  the  value  of  some  input.  The  second  example  involving  emergency  surgery  (non- 
parametric)  suggests  indirect  information  flow  or  a  control  dependence  in  which  a  control  statement  affects 
the  value  of  a  variable  by  affecting  whether  an  assignment  to  that  variable  executes  (e.g.,  [FOW87]).  While, 
formalisms  such  as  noninterference  do  not  apply  to  our  optimization  models  (MDPs  and  POMDPs),  we 
would  like  to  ensure  that  our  formalization  of  information  use  may  handle  all  forms  of  information  use. 
Fortunately,  the  POMDP  model  naturally  captures  both  parametric  and  non-parametric  use  of  information. 
The  next  example  illustrates  modeling  non-parametric  information  flow  with  a  POMDP. 

3.1.2  Advertising  Example:  Non-Parametric  Information  Use 

The  previous  example  dealt  with  parametric  information  use  in  which  the  actions  that  involve  information 
directly  show  that  information  (i.e.,  the  actions  are  parametrically  labeled  with  observations  relevant  to  a 
class  of  information).  In  this  example,  the  agent  uses  information  but  none  of  its  actions  directly  show  that 
information. 

Consider  an  website  attempting  to  determine  which  advertisement  to  show  a  website  visitor.  The  website 
has  access  to  a  database  of  information  about  potential  visitors  that  it  can  use  to  select  advertisements.  Since 
some  advertisements  are  more  effective  for  some  demographics  than  others,  it  is  in  the  website’s  interest  to 
use  this  information. 

However,  a  privacy  policy  governs  what  information  the  website  may  use  for  the  purpose  of  marketing, 
including  determining  the  advertisement  to  show  the  visitor.  The  policy  states 

We  will  not  use  information  you  provide  about  your  sex  for  the  purpose  of  marketing. 

Since  the  entry  of  the  website’s  database  for  a  visitor’s  sex  is  created  from  the  information  that  the  visi¬ 
tor  provides  to  the  website,  the  policy  prohibits  the  use  of  the  database’s  entry  for  the  visitor’s  to  select 
advertisements. 

For  simplicity,  we  assume  that  the  only  information  relevant  to  advertising  is  the  sex  of  the  visitor.  We 
further  assume  that  the  website  is  choosing  among  three  advertisements:  ad  | .  ad2,  and  ad3.  We  presume 
adi  is  the  best  for  females  and  the  worst  for  males  (on  average),  ad3  is  the  best  for  males  and  the  worst  for 
females,  and  ad2  strikes  a  middle  ground.  We  further  presume  that  these  advertisements  are  generic  and  do 
not  directly  contain  any  information  about  the  visitor  to  whom  the  website  shows  it. 

Figure  3.3  shows  part  a  POMDP  model  macju  of  a  such  an  advertising  website.  Unlike  the  POMDP 
mphy  of  the  physician  example  in  Figure  3.2,  the  state  space  of  this  model  is  factored:  it’s  a  tuple  with  each 
component  representing  some  factor  about  the  state.  This  state  space  involves  three  factors  (components  of 
the  tuple).  The  first  factor  is  f  if  the  visitor  is  a  female  and  m  if  the  visitor  is  male.  The  second  factor  shows 
what  the  database  records  about  the  visitor  (with  _L  representing  that  the  database  does  not  have  a  record 
for  the  visitor).  The  third  factor  records  what  advertisement  the  website  has  shown  to  the  visitor  (with  0 
indicating  that  the  website  has  not  shown  an  advertisement).  For  example,  the  state  (f ,  _L,  ad2)  indicates  that 
the  visitor  is  a  female,  the  database  does  not  record  her  sex,  and  the  website  has  shown  her  acF. 

Intuitively,  we  would  expect  that  the  website  will  first  perform  the  action  lookup  and  show  adi  to  a 
female  and  ad3  to  a  male.  Our  reasoning  is  that  this  plan  maximizes  the  effectiveness  of  the  advertisements. 
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<f,adi) 


(f,ad2) 


(f,ad3) 


(_L,  adi) 


(_L,ad2> 


(_L,  ad3) 


{-L,  adi) 


(_L,ad2> 


(-L,  ad3> 


(m,  adi) 


(m,  ad2) 


(m,ad3) 


Figure  3.3:  POMDP  model  ma dv  of  the  advertising  example.  This  figure  uses  the  same  conventions  as  our  other 
POMDP  figure  shown  in  Figure  3.2.  Due  to  space  constraints,  we  do  not  show  states  that  correspond  to  the  database 
being  incorrect,  such  as  (f,  m,  0). 
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However,  how  the  website  will  actually  behave  depends  upon  the  website’s  initial  beliefs.  For  example,  if 
the  website  believes  that  all  visitors  are  female  or  that  the  database  is  inaccurate,  it  will  not  bother  to  check 
the  database. 

The  behavior  we  intuitively  expect  corresponds  to  the  initial  beliefs  we  implicitly  presumed  from  the 
example  description:  that  database  is  accurate  (e.g.,  assigns  a  low  probability  to  states  such  as  (f,  m,  0)), 
that  the  database  is  useful  (e.g.,  assigns  a  higher  probability  to  the  state  (f ,  f,  0}  than  the  state  (f,  _L,  0)), 
that  website  has  not  shown  an  advertisement  to  the  visitor  yet  (i.e.,  assigns  zero  probability  to  states  of 
the  form  ( g ,  d,  ad.;)),  and  that  visitors  are  equally  likely  to  be  female  or  male.  Under  these  presumptions, 
optimal  strategies  behave  as  we  intuitively  expect:  The  website  is  to  first  check  whether  the  database  contains 
information  about  the  visitor.  If  the  database  records  that  the  visitor  is  a  female,  then  the  website  shows  her 
ad;.  If  it  records  a  male,  the  website  shows  ad3.  If  the  database  does  not  contain  the  visitor’s  sex  (holds  _L), 
then  the  website  shows  ad2- 

3.1.3  Summary  of  Chapter 

The  reminder  of  this  chapter  makes  the  intuitions  already  introduced  formal.  In  particular,  we  discuss  using 
the  POMDP  model  to  audit  purpose  restrictions  involving  the  use  of  information.  To  do  so,  we  must  provide 
a  semantics  of  information  use.  To  that  end,  we  note  that  purpose  restrictions  typically  do  not  govern  the 
use  of  knowledge  gained  from  information  sources  not  mentioned  by  the  policy  even  if  this  knowledge  can 
also  be  inferred  from  information  sources  prohibited  by  the  policy.  For  example,  Yahoo!  promises  not  to 
use  the  contents  of  a  user’s  emails  for  the  purpose  of  marketing.  We  expect  this  to  prohibit  Yahoo!  from 
reading  a  user’s  emails  to  determine  what  advertisements  to  show  him.  However,  we  do  not  expect  Yahoo! 
to  avoid  using  for  marketing  knowledge  of  the  user  that  it  has  collected  by  other  means  even  if  some  of  this 
knowledge  is  separately  implied  by  the  user’s  emails. 

Thus,  a  purpose  restriction  limits  the  use  of  information  from  a  source  rather  than  some  class  of  knowl¬ 
edge.  Limiting  information  from  a  source  includes  limiting  any  direct  observations  of  that  source  or  infer¬ 
ences  that  would  be  impossible  without  such  observations.  Understanding  these  restrictions  does  not  require 
epistemic  models  of  knowledge  (e.g.,  [FHMV95])  nor  fine-grain  inference  control  (e.g.,  [FJ02]).  Rather, 
similar  to  how  noninterference  characterizes  information  use  for  computer  programs  [GM82],  these  restric¬ 
tions  require  understanding  how  observations  of  information  change  the  agent’s  behavior.  However,  whereas 
noninterference  starts  with  the  automaton  model  of  programs,  enforcing  purpose  restrictions  requires  under¬ 
standing  a  purpose-driven  planning  agent  with  a  model  such  as  the  POMDP  model.  The  POMDP  model 
allows  us  to  model  the  agent’s  environment  with  the  purpose  in  question  defining  the  reward  function  of  the 
POMDP  (Section  3.2). 

The  explicitness  of  partial  observations  in  the  POMDP  model  allows  us  to  formalize  information  use 
by  considering  how  the  agent  would  plan  if  some  observations  were  conflated  to  ignore  information  of 
interest  (Section  3.3).  We  do  so  by  quotienting  the  space  of  observations  by  an  equivalence  relation  that 
treats  two  observations  as  indistinguishable  if  they  only  differ  by  information  whose  use  is  prohibited  by  the 
purpose  restriction.  By  ignoring  this  distinguishing  information,  we  simulate  ignorance  of  the  information. 
Such  quotienting  is  well-defined  for  POMDPs  since  observations  only  probabilistically  constrain  the  space 
of  possible  current  states  of  the  agent’s  environment,  and  quotienting  just  decreases  the  accuracy  of  this 
constraining. 

We  test  whether  an  agent  uses  information  for  a  purpose  by  comparing  the  behaviors  of  the  agent  to 
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the  behaviors  it  would  manifest  had  it  planned  its  actions  in  this  simulated  state  of  ignorance  (Section  3.4). 
We  provide  an  auditing  algorithm  using  our  formalism  to  compare  the  behavior  of  an  agent  to  how  it  would 
behave  under  such  ignorance  (Section  3.5).  Our  algorithm  use  an  off-the-shelf  approximation  algorithm  for 
POMDPs.  Our  algorithm  automates  much  of  the  enforcement  of  purpose  restrictions  governing  information 
use. 

Throughout  this  chapter  we  employ  the  two  examples  already  introduced.  These  examples  together 
show  that  our  formalism  can  handle  both  parametric  and  non-parametric  information  use. 

3.2  Planning  under  Partial  Observations 

To  model  information  use,  we  first  present  the  formalism  of  Partially  Observable  Markov  Decision  Processes 
(POMDPs)  [Son71].  In  particular,  we  present  previous  work  on  the  formal  model  and  on  how  to  reduce  the 
optimization  of  POMDPs  to  the  optimization  of  a  related  belief  MDP.  We  then  present  two  examples  and 
discuss  applying  the  idea  of  non-redundancy  to  this  model. 

In  general,  an  agent  planning  for  some  purpose  constructs  a  POMDP  to  help  select  its  actions.  The 
POMDP  models  the  agent’s  environment  and  how  its  actions  affects  the  environment’s  state  and  the  satis¬ 
faction  of  the  purpose  it  is  pursuing.  The  agent  selects  a  plan  that  optimizes  the  expected  total  discounted 
reward  (degree  of  purpose  satisfaction)  under  the  POMDP.  (For  a  survey,  see  [Mon82].) 

3.2.1  Partially  Observable  Markov  Decision  Processes 

A  POMDP  is  a  tuple  ( S ,  A,  t.  r,  O ,  is,  7)  where 

•  S  is  a  finite  state  space; 

•  A,  a  finite  set  of  actions; 

•  t  :  S  x  A  —>  Dist(iS),  a  transition  function  from  a  state  and  an  action  to  a  distribution  over  states; 

•  r:  5x4- >  1,  a  reward  function; 

•  O,  a  finite  observation  space  containing  any  observations  the  agent  may  perceive  while  performing 
actions; 

•  v  :  A  x  S  — *  Dist(O),  a  distribution  over  observations  given  an  action  and  the  state  resulting  from 
performing  that  action;  and 

•  7,  a  discount  factor  such  that  0  <  7  <  1. 

An  execution  of  a  POMDP  rn  is  an  infinite  interleaving  of  states,  actions,  and  observations.  For  example, 
[si,  ai,  01,  S2,  a,2, 02,  ■  . .]  is  an  execution  in  which  the  modeled  agent  started  in  the  state  si  from  which  the 
agent  performs  action  a\  causing  the  observation  01  and  the  agent’s  environment  transitions  to  state  S2  from 
which  the  agent  performs  action  a 2  causing  the  observation  02,  and  so  forth. 

An  agent  does  not  know  a  priori  which  of  the  possible  states  of  the  POMDP  is  the  current  state  of  its 
environment.  Rather  it  holds  beliefs  about  which  state  is  the  current  state.  In  particular,  the  agent  assigns  a 
probability  to  each  state  s  according  to  how  likely  the  agent  believes  that  the  current  state  is  the  state  s.  A 
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belief  state  (3  captures  these  beliefs  as  a  distribution  over  states  of  S  (i.e.,  j3  €  Dist(<S)).  An  agent’s  belief 
state  is  updated  as  it  performs  actions  and  makes  observations.  When  an  agent  takes  the  action  a  and  makes 
the  observation  o  starting  with  the  beliefs  /3,  the  agent  develops  the  new  beliefs  /3'  (also  a  distribution  over 
S).  [3' {s')  is  the  probability  that  s'  is  the  next  state. 

We  define  updatem(/3,  a.  o)  to  equal  the  updated  beliefs  (3'.  f3 '  assigns  to  the  state  s'  the  probability 
P'(s')  =  Pr[S"=s'  |  ()=o,  A=a.  B=/3\  where  S'  is  a  random  variable  over  next  states,  B=(3  identifies 
the  agent’s  current  belief  state  as  (3,  A=a  identifies  the  agent’s  current  action  as  a,  and  0=o  identifies  the 
observation  the  agent  makes  while  performing  action  a  as  o.  To  compute  the  value  of  updatem(/3,  a,  o),  we 
may  reduce  it  to  a  formula  in  terms  of  the  POMDP  model  m  as  follows: 


(3.1)  updatem(/3,  a,  o)(s') 

(3.2)  =  Pr[SW  |  0=0,  A=a,  B=(3] 

Pr[0=o  |  S'=s' ,  A=a,  B=f3\Pr[S'=s'  \  A=a,  B=/3] 

=  Pr[0=o  |  A=a,B=0\ 

^  ^  Pr[0=o  |  S’=s',  A=a,  B=j3]  Yhs&s  Pr[S'=s  |  A=a,  B=/3\Pr[S'=s'  \  A=a,  B=(3,  S=s\ 
(3'4>  =  Pr[0=o  |  A=a,B=0\ 

n  s  _  ^(a,s')(o)Y)s£s(3{s)  *t(s,a){s') 

Pr[0=o  |  A=a,B=(3\ 


Line  3.4  follows  since  the  POMDP  can  have  only  one  current  states  making  different  possible  current  states 
mutually  exclusive  events.  That  is,  for  any  two  differing  states  si  and  S2,  the  event  of  the  current  state  being 
si  (i.e.,  S  =  si)  and  the  event  of  the  current  state  being  S2  (i.e.,  S  =  S2)  are  mutually  exclusive.  Since 
Pr[0=o  j  A=a,  B=j3\  is  independent  of  s',  it  may  be  treated  as  a  normalization  factor  equal  to 


(3.6) 


^2  s')(°)  J2  a)(s/) 

s'es  ses 


Similar  to  MDPs,  the  agent  does  not  need  to  track  its  history  of  actions  and  observations  independently 
of  its  beliefs  as  such  beliefs  are  a  sufficient  statistic.  Thus,  the  agent’s  strategies  need  only  deal  with  beliefs 
and  are  formalized  as  a  function  from  beliefs  to  actions.  That  is,  the  space  of  possible  strategies  that  agent 
may  employ  given  a  POMDP  is  Dist(«S)  — >  A. 

The  goal  of  the  agent  is  find  the  optimal  strategy.  By  the  Bellman  equation  [Bel52],  the  expected  value 
of  a  belief  state  (3  under  a  strategy  cr  for  m  =  (S,  A,  t,  r,  O,  u,  7)  is 


Vm(o,(3)  =  Rm(f3,cr{f3 ))  + 1^2  Nm(f3,(r(f3))(o)  *  Vm{o,  updat em{/3,cr{(3),o)) 

oeo 

where  R  and  N  are  r  and  u  raised  to  work  over  beliefs: 


(3.7) 


and 


Rm{f3,  a)  =  ^2  P(s)  *  r(si a) 

s£< 5 


(3.8)  Nm{(3 ,  a)(o)  =  Pr[0=o  |  B=(3 ,  A=a ]  =  ^2  P(s)  *  ^ 2  a)('s,)  *  s,)(0) 

s€<S  s'GS 
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A  strategy  a  is  optimal  if  for  every  state  /?,  it  is  optimal  (i.e.,  if  for  all  (3,  Vm(a,(3 )  equals  V*  (ft)  = 
maxCT/  Vm(a',l3)). 

We  are  also  interested  in  the  optimal  value  of  a  performing  an  action  from  a  belief  state.  For  a  POMDP 
m,  we  define  the  quality  of  an  action  given  a  belief  state  as  follows: 

Q*m(P,a)  =  Rm(P,a)  +7  J^iVm(/3,o)(o)  *  V£(updat em((3,o((3),o)) 

oeo 

An  action  a  is  optimal  for  a  belief  state  if  and  only  if  Q*m{(3,  a)  =  Vfifi). 

The  theory  of  POMDPs  reduces  the  process  of  finding  the  optimal  strategy  for  a  POMDP  to  that  of 
finding  the  optimal  strategy  for  a  related  MDP  that  uses  belief  states  as  its  state  space  (e.g.,  [Son78]).  This 
reduction  allows  us  to  reuse  the  theory  of  MDPs  to  find  the  optimal  strategies  of  POMDPs.  Let  bmdp  be  a 
function  that  takes  a  POMDP  and  produces  its  belief  MDP .  We  define  bmdp  as  follows: 

bmdp((S,  A,  t,  r,  O,  u,  7})  =  (B,  A,  r,  Rm,  7) 

where  Rrn  is  as  defined  in  Equation  3.7,  the  (infinite)  state  space  B  is  the  set  of  all  possible  beliefs  Dist(<S), 
and  r  :  B  x  A  — +  Dist(£>)  is  a  transition  function  that  depends  upon  the  agent’s  beliefs  and  observations: 

t(/3,  a)((3')  =  Pr [B'=j3'  \  B=/3,  A=a\ 

for  all  (3  and  (3'  in  B  and  am  A. 

To  compute  the  value  of  r(/3,  a)(/3'),  we  reduce  it  to  features  of  the  POMDP  model  m.  In  particular,  we 
express  r(/3,  a){J3')  as  a  formula  of  Nrn  as  defined  in  Equation  3.8  and  0m(/3,  a,  /?'),  the  set  of  observations 
that  can  accompany  a  belief  state  (3  transitioning  under  the  action  a  to  the  belief  state  f3'  under  the  model  m: 

0m(/3,  a,  (3')  =  {  o  <E  O  |  updat em(f3,  a,  o)  =  (3' }. 

r(/3,  a)(f3')  =  Pr [B'=f3'  \  B=f3,  A=a\ 

=  Nm((3,a){o ) 

o€0m(/3,a,/3') 

For  a  POMDP  m,  the  optimal  strategy  of  bmdp(m)  is  a  function  from  the  state  space  of  bmdp(m)  to  an 
action.  As  the  state  space  is  the  set  of  all  beliefs  about  states  in  m,  such  a  strategy  is  also  a  strategy  for  the 
POMDP  m.  Furthermore,  the  optimal  strategy  of  bmdp(m)  is  also  the  optimal  strategy  of  m.  The  optimal 
value  that  a  belief  MDP  bmdp(m)  assigns  to  a  belief  state  is  equal  to  the  optimal  value  that  the  POMDP 
m  assigns  to  it.  This  result,  proved  as  Proposition  3  below,  implies  that  the  belief  MDP  is  equivalent  to  the 
POMDP  in  that  each  shares  the  same  optimal  strategies. 

Proposition  3.  For  all  POMDPs  m,  for  all  belief  states  f3  and  actions  a  of  m,  Vr*  (  ff)  =  ^bnidpi  m)  (V)  and 
Q*m{P,a)  =  q*hmMm){f3,a). 

Since  belief  MDPs  are  the  subject  of  previous  work,  we  defer  details  and  proofs  to  Appendix  A. 

To  define  the  behaviors  of  a  POMDP,  we  do  not  focus  on  prefixes  of  executions  that  refer  to  the  state 
space  of  POMDP  since  the  agent  does  not  have  knowledge  of  the  current  state  of  its  environment.  Rather,  we 
use  the  belief  states  that  the  agent  could  hold.  We  say  that  a  sequence  [/3i,  a\,  o±,  @2,  0-2,  02,  ■  ■  ■ ,  (3n,  a-m  °n] 
in  [B  x  A  x  O)*  is  a  possible  behavior  of  a  POMDP  m  if  and  only  if  [f3\ ,  a\ ,  ^2,  ^2,  •  •  •  f3n,  (iri]  is  a  possible 
behavior  of  bmdp(m)  and  for  all  i  <  n,  f3i+  \  =  updat em(/3j,  a,,  of). 
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3.2.2  Advertising  Example:  Model 


The  POMDP  Model.  The  model  of  the  example  involving  an  advertising  website  shown  in  Figure  3.3 
can  be  formalized  as  a  POMDP  mac]v  =  ( S ,  A.  t,  r.  O ,  v,  7).  Formally,  the  state  space  is  S  =  {f,  m}  x 
{f,  m,  _L}  x  {adi,  ad2,  ad3,  0}  where  f  indicates  a  female;  m,  a  male;  _L,  that  the  database  does  not  contain 
the  visitor’s  sex;  ad*,  that  the  website  has  shown  the  visitor  the  /th  advertisement;  and  0,  that  the  website 
has  not  shown  an  advertisement. 

The  action  space  is  A  =  {lookup,  adi,  ad2,  ads}.  The  actions  adi,  ad2,  and  ad3  correspond  to  the 
website  showing  the  visitor  one  of  the  three  possible  advertisements  while  lookup  corresponds  to  the  website 
looking  up  information  on  the  visitor. 

The  actions  and  states  are  related  by  the  transition  relation  t.  While  the  website  has  uncertainty  about 
the  sex  of  the  visitor,  the  transition  relation  t  is  deterministic  given  the  current  state  of  the  environment  and 
the  website’s  action.  Thus,  in  this  model,  for  all  states  s  and  actions  a,  the  distribution  t(s,  a)  is  always  a 
degenerate  distribution.  The  degenerate  distribution  degen  (x)  is  equal  to  1  at  the  value  of  x  and  0  every¬ 
where  else  (i.e.,  degen(x)(x)  =  1  and  for  all  y  /  x,  degen (x)(y)  =  0).  In  our  model,  t({g,  d,  0 ),  adi)  = 
degen((g,  d,  adj))  for  all  g  in  {f,  m},  d  in  {f,  m,  _L},  and  i  in  {1,  2, 3}  reflecting  that  showing  an  adver¬ 
tisement  does  not  change  the  visitor’s  sex  or  the  website’s  database.  Furthermore,  t ({g,  d,  adj),  adj)  = 
degen((<7,  d,  adj))  since  the  website  can  show  the  visitor  only  one  advertisement.  Lastly,  t(s.  lookup)  = 
degen  (s)  since  looking  up  information  in  the  database  does  not  change  the  state  of  the  environment.  (It  can, 
however,  change  the  belief  state  of  the  agent.) 

These  transitions  are  accompanied  by  a  reward.  The  reward  for  showing  an  advertisement  depends  upon 
the  visitor’s  sex  as  follows: 

r((f,d,0),adi)  =  9 
r((f,  d,  0),ad2)  =  7 
r((f,d,0),ad3)  =  3 

with  r(s,  a)  =  0  for  all  other  states  s  and  actions  a. 

The  observation  space  is  O  =  {f,  m,  _L}  x  {adi,  ad2,  ad3,  0}.  The  observation  (d,  a)  reveals  that  the 
database  holds  information  d  about  the  visitor’s  sex  and  that  the  visitor  has  seen  advertisement  a  (with 
a  =  0  if  the  visitor  has  not  seen  one). 

The  function  v  relates  these  observations  to  actions  and  states.  Again  we  restrict  our  attention  to  degen¬ 
erate  distributions  since  this  example  contains  uncertainty  but  not  truly  random  processes.  For  each  state 
(g,d,a),  the  lookup  action  results  in  the  observation  ( d,a ).  Thus,  ^(lookup,  ( g,d,a })  =  degen ((d,  a)). 
(Note  that  the  second  argument  to  u  is  the  next  state  that  results  from  performing  the  action,  not  the  current 
state.  However,  the  transition  relation  t  is  such  that  lookup  does  not  change  the  state  of  the  system  and  the 
next  state  is  the  current  state.)  We  model  showing  an  advertisement  as  providing  the  dummy  observation  o: 
z/(adj,  s)  =  degen (o)  for  all  i  and  states  s. 

We  use  a  discounting  factor  of  7  =  0.9  (to  pick  an  arbitrary  but  reasonable  value).  These  components 
define  the  example  POMDP  to  be  macjv  =  ( <S ,  A ,  t,  r,  O,  v,  7).  The  first  part  of  a  possible  execution  of  madv 
with  a  total  reward  of  7  is 

[(f,_L,0),  ad 2,  o,  (f,  _L,  ad2),  . . .] 

As  mentioned  above,  all  of  the  transitions  of  the  POMDP  model  result  in  degenerate  distributions.  The 
paucity  of  non-degenerate  probabilistic  transitions  is  a  feature  of  this  example  in  which  none  of  the  actions 


r((m,  d,  0),  adi)  =  3 
r((m,d,0),ad2)  =  7 
r((m,  d,  0),  ad3)  =  9 
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involve  random  processes.  Rather,  the  uncertainty  in  this  example  results  from  the  website  not  knowing  a 
priori  the  sex  of  the  visitor.  This  uncertainty  is  captured  by  the  model  by  having  the  website  not  knowing  a 
priori  what  its  initial  state  is.  Intuitively,  the  action  lookup  attempts  to  remove  this  uncertainty  by  providing 
the  website  with  an  observation  that  reduces  the  set  of  states  that  it  has  to  consider  possible.  In  the  case  of 
making  the  observation  (f,  a)  or  (m,  a),  this  reduction  identihes  the  visitor’s  sex. 

The  way  we  represent  the  POMDP  ma dv  in  Figure  3.3  fails  to  represent  many  parts  of  the  formal  model. 
For  example,  the  figure  does  not  show  that  performing  the  action  ad  in  state  (f ,  f ,  adi)  results  in  the  physician 
observing  o.  While  this  figure  serves  to  focus  the  reader’s  attention  on  the  interesting  aspects  of  the  POMDPs 
illustrated,  the  figure  is  not  intended  to  replace  its  formal  representation. 

Beliefs  and  Optimal  Actions.  We  can  formalize  the  intuitive  initial  beliefs  we  assigned  to  the  web¬ 
site  as  a  belief  state  /3q.  The  belief  that  database  is  accurate  requires  that  -fj2* Q'))  =  0  for  a  in 

{adi,  ad2,  ad,3,  0}  and  all  g\  and  g-2  in  {f,  m}  such  that  g\  ^  g-i-  That  the  website  has  not  shown  an  ad¬ 
vertisement  to  the  visitor  yet  requires  that  /3o((g,  d,  ad*))  =  0  for  all  g  in  {f,  m},  d  in  {f,  m,  _L },  and  i  in 
{1,  2, 3}.  That  visitors  are  equally  likely  to  be  female  or  male  requires  that  d,  a))  =  /3o((m,  d,  a))  for 
all  d  and  a.  The  final  constraint  that  the  database  be  useful  does  not  yield  an  exact  constraint  on  /3q .  Rather, 
we  interpret  it  to  imply  the  inequality  ((g,  g,a))  >  Bo((g.  _L,  a)).  To  be  concrete,  we  take  6q  to  be  as 
follows: 


A)((f,f,0))  =  0.4 
A)((m,m,0))  =  0.4 
(5 o«f,±,0»  =0.1 
A)((m,  -L,  0))  =  0.1 

Po((g,  d,  adj))  =  0  For  all  g  in  {f,  m},  d  in  {f,  m,  _L},  and  i  in  {1,  2, 3} 


The  optimal  action  to  perform  from  the  belief  state  (3q  is  lookup.  To  see  this  result,  note  that  after 
showing  the  visitor  an  advertisement,  no  further  rewards  are  possible.  Thus, 


(3.9) 

Qmadv (A),  adi)  =  #madv(A),adi)  +  7  ^  JVmadv(/30,  adi)(o)  *  Cadv(uPdatemadv 

oGO 

(An  adi,  0)) 

(3.10) 

=  ^mad„(A),adi)  +  7^Nmadv(/?0,adi)(o)  *0 

oGO 

(3.11) 

=  ^/30(s)  *r(s,adi) 

sGS 

(3.12) 

=  ^2Po((f,d,a))  *r((f,d,a),adi)  +  5^ja0((m,d,a))  *r((m 

d,a  d,a 

,d,  a),  adi) 

(3.13) 

=  (0.4  *  9  +  0.1  *  9)  +  (0.4  *  3  +  0.1  *  3) 

(3.14) 

=  6 

where  Line  3.10  comes  from  the  fact  that  no  further  rewards  are  possible  after  showing  a  single  advertise- 
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ment.  Similarly, 

Q*maJft,  ad2)  =  (0.4  *7  + 0.1  *7) +  (0.4  *7  + 0.1  *7)  =  7 
Q*maJP 0,ad3)  =  (0.4  *3  +  0.1  *3) +  (0.4  *9  +  0.1  *9)  =  6 


To  compute  (A)  lookup),  we  must  examine  the  rewards  that  can  occur  after  performing  lookup. 
These  rewards  depend  upon  the  observations  made  from  lookup.  Three  observations  are  possible  after 
performing  lookup  from  ft:  (f,  0),  (m,  0),  and  (+,  0).  In  the  case  of  observing  (f,  0),  the  next  belief  state 
will  be  updatemadv(/?o,  lookup,  (f,  0))  =  ft  where 


(3.15) 

(3.16) 

(3.17) 


/3f«f,  f,  0» 


^(lookup,  (f,  f,  0))((f,  0))  Esgs  P(s)  *  t(s,  lookup)((f,  f,  0)) 
Es'G5  ^(lookup,  s')((f,  0))  Ese5  P(s)  *  *(«>  lookup)(s/) 

1  *  (0.4  *  1) 

1*  (0.4*1) 


=  1 


where  Line  3.15  comes  from  Lines  3.5  and  3.6.  Line  3.16  comes  from  the  fact  that  t(s,  lookup)(s/)  is  equal 
to  1  for  s  =  s'  and  that  z/(lookup,  (g,  d,  a))  =  degen ((d,a)).  Similarly,  updatemadv  (/?o,  lookup,  (m,  0))  = 
/3m  =  degen((m,  m,  0))  and  updatem  (/3o,  lookup,  (+,  0))  =  /3±  where  /3j_((f,  _L,  0))  =  0.5  and 
/3_i_((m,  +,  0))  =  0.5.  Using  calculations  similar  to  those  done  for  Qmadv(Po,  adi),  we  can  compute 
Qrr,.idv (ft- ad i ).  Doing  these  calculations  for  every  action  and  each  of  the  belief  states  ft,  ft,  and 
we  can  find  the  optimal  actions  and  their  values  for  each  of  these  three  belief  states: 


V^Jft)  =  Q*maJft,zd1)  =  9 
=  ad3)  =  9 

V^Jft)  =  Q*maJft,  ad2)  =  7 


Furthermore, 


(3.18) 

\  dv  (A),  lookup)«f,  0))  =  ^2  Po (s)  *  E  t(s,  lookup)(s/)  *  ^(lookup,  s')(( f,  0)) 

s£S  s'£S 

(3.19)  =  Po(s)  *  ^(lookup,  s)((f,  0)) 

s€S 

(3.20)  =  ft((f,f,0))  *  ^(lookup,  (f,f,0))((f,0)) 

(3.21)  =0.4*1 

(3.22)  =  0.4 


where  Line  3.19  results  from  the  fact  that  t(s,  lookup)  =  degen (+).  Line  3.20  follows  from  the  fact  that  the 
only  state  s  such  that  both  ft (s)  and  ^(lookup,  s)((f,  0))  are  non-zero  is  (f,  f,  0).  Similarly, 

Nm3ASP 0’  lookup)((m,  0))  =  /3o«m,  m,  0))  *  i/(lookup,  (m,  m,  0))((m,  0)) 

=  0.4 
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and 


Armadv(/3O,lookup)((±,0)) 

=  A)«f,  ±,  0))  *  ^(lookup,  (f,  _L,  0))((-L,  0))  +  /3o«m,  _L,  0))  *  ^(lookup,  (m,  X,  0))((_L,  0)) 
=  0.1  *  1  +  0.1  *  1 
=  0.2 


and 


NmadJAh  lookup) (o)  =  ^2Po(s)  *  ^2  lookup)(s/)  *  i/(lookup,  s')(o) 

sE«S  s'ES 

=  2^  (s)*5Z  t(s,  lookup)(s/)  *  0 

s£S  s'£S 

=  0 

Putting  these  parts  together,  we  find  that 
Qmadv(A),  lookup) 

=  ^m,adv  (A) ,  looku  p)  +  7  (/3o ,  looku  p)  (o)  *  V^adv  (u  pdatemadv  (ft ,  looku  p,  o) ) 

oeo 

=  0  +  0.9  iVmadv(^o,  looku p)(o)  *  T^adv(updatemadv(A),  lookup,  o)) 

oGO 

/  iV'madv(/3o,lookup)((f,0))  *  ^adv(updatemadv(/30,  lookup,  (f,0)))  \ 

=  09  +  Nrn^  lookup)((m,  0))  *  (update^  (ft,  lookup,  (m,  0))) 

+  ^madv^o,  looku p)((X,0))  *  E£adv(updateTOadv(ft,  lookup,  (X,0») 

\+  iVmadv(ft,  lookup) (o)  *T^adv(updatemadv(A),  lookup,  o))  / 

=  0.9  (0.4  *  V7lJf3 f)  +  0.4  *  ~Cadv(/?m)  +  0.2  *  V^(J3±)  +  0) 

=  0.9  (0.4  *  9  +  0.4  *  9  +  0.2  *  7) 

=  7.74 

Comparing  this  value  to  <2madv(ft,  adi),  Qmadv (ft,  ads),  and  <3madv(ft,  ads),  we  find  that  lookup  is  the 
optimal  action  for  the  website  to  take  from  belief  state  ft. 

Thus,  an  optimal  strategy  a*  to  m3 dv  must  be  such  that  <7*  (ft)  =  lookup,  <7*  (A)  =  adi,  <7*  (An)  =  ad3, 
and  <7*(ft_)  =  ad2-  These  results  match  our  intuitions.  Various  optimal  strategies  differ  as  to  what  the 
website  does  after  showing  the  advertisement  as  such  actions  do  not  affect  the  reward.  (We  return  to  this 
point  later  when  we  consider  non-redundancy  in  Section  3.2.4.) 

3.2.3  Physician  Example:  Model 

The  POMDP  Model.  The  example  shown  in  Figure  3.2  corresponds  to  a  POMDP  mphy .  mphy  is  equal  to 

(S,  A ,  t,  r,  O,  v,  7)  where 

•  S  =  {si,  S2,  •  •  •  ,s7,  S8}; 
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state  s 

action  a 

distribution  t(s,  a)  over  next  states 

si 

take 

degen (s3) 

S2 

take 

degen (s4) 

S3 

sendxi 

degen (s5) 

S3 

diagnose 

degen (s7) 

S4 

sendx2 

t(s4,  a)(ss)  =  0.2, 1(54,  o)(sq)  =  0.8,  t(s,  a)(s)  =  0  for  s  ^  {s6,  ss} 

S5 

diagnose 

degen (s7) 

S6 

diagnose 

degen (s8) 

S 

a 

degen  (s)  for  all  remaining  s  and  a 

Table  3.1:  Transition  relation  for  the  POMDP  mphy.  Recall  that  the  degenerate  distribution  degen(x)  is  equal 
to  1  at  the  value  of  x  and  0  everywhere  else. 


•  A  =  {take,  sendx’i,  sendx2,  diagnose}; 

•  t  is  defined  in  Table  3.1; 

•  r  is  such  that  r(s3,  diagnose)  =  r(ss,  diagnose)  =  r(s6,  diagnose)  =  12  and  r(s,  a)  =  0  for  all  other 
values  of  s  G  S  and  a  €  A; 

•  O  =  {xi,x2,o}; 

•  v  is  defined  such  that  v{ take,  s 3)  =  degen(xi),  i/(take,  S4)  =  degen(x2),  and  z/(a,  s')  =  degen(o) 
for  all  other  actions  a  and  states  s';  and 

•  7,  which  is  not  represented  in  the  figure,  is  0.9  (to  pick  an  arbitrary  but  reasonable  value). 

As  with  the  POMDP  mac]v  of  Section  3.2.2,  this  POMDP  does  not  feature  many  non-degenerate  proba¬ 
bilistic  transitions.  Again,  this  paucity  is  a  feature  of  the  example  in  which  few  of  the  actions  involve  random 
processes.  Much  of  the  uncertainty  in  this  example  results  from  the  physician  not  knowing  the  status  of  his 
patient,  which  is  captured  by  the  model  by  having  the  physician  not  knowing  a  priori  what  his  initial  state 
is.  Intuitively,  the  action  take  removes  this  uncertainty  by  providing  the  physician  with  an  observation  that 
identifies  his  current  state. 

As  with  the  model  macjv>  the  way  we  represent  the  POMDP  mphy  in  Figure  3.2  also  does  not  represent 
many  parts  of  the  formal  model.  For  example,  the  figure  does  not  show  that  performing  the  action  take  in 
state  S3  results  in  the  physician  observing  x\.  This  fact  is  implied  since  v  only  depends  upon  the  resulting 
state  (S3)  and  the  action,  not  the  original  state  (si  or  S3). 

Beliefs  and  Optimal  Actions.  Consider  the  initial  beliefs  (5q  discussed  in  Section  3.1.1  that  assigns  non¬ 
zero  probabilities  to  only  the  possible  initial  states  (/?o(si)  =  0.9  and  /3q(s2)  =  0.1).  Under  the  model 
mphy,  starting  from  the  initial  belief  state  3q,  the  physician  will  learn  with  certainty  which  state  he  is  in  after 
observing  the  value  of  the  X-ray.  Thus,  his  belief  state  will  be  a  degenerate  distribution  after  performing 
the  action  take.  Performing  calculations  similar  to  those  performed  in  Section  3.2.2,  we  may  find  that 
to  be  an  optimal  strategy  for  mphy,  a  strategy  o*  must  be  such  that  cr*(/3 0)  =  take,  a* (degen (S3))  = 
a* (degen (S5))  =  <r*(degen(s6))  =  diagnose,  and  cr*(degen(s4))  =  sendx2- 
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3.2.4  Non-redundancy 


In  the  advertising  example,  the  actions  of  the  website  after  showing  the  advertisement  are  unconstrained. 
The  reason  is  that  showing  the  advertisement  will  result  in  the  current  state  of  the  POMDP  becoming  of  the 
form  (g,  d,  ad,;).  States  of  this  form  are  all  absorbing :  all  possible  actions  result  in  returning  to  the  same 
state.  Furthermore,  all  the  actions  possible  from  these  states  result  in  zero  reward.  Since  the  only  criterion 
of  an  optimal  strategy  is  its  expected  total  discounted  reward,  a  strategy  may  assign  any  action  to  these  states 
without  changing  whether  it  is  optimal. 

This  effect  leads  to  the  counterintuitive  result  that  performing  lookup  in  the  belief  state  degen  ((f,  f,  adi)) 
is  for  the  purpose  of  marketing.  This  result  is  counterintuitive  since  the  same  total  reward  is  possible  regard¬ 
less  of  whether  the  agent  performs  the  lookup  action  from  the  state  (f,  f,  adi).  Thus,  the  agent  believes  that 
it  is  performing  an  action  that  cannot  further  its  total  reward  and  yet  it  is  still  for  the  purpose  represented  by 
the  reward. 

Indeed,  this  counterintuitive  result  is  true  of  all  the  actions  in  A.  Yet,  by  the  definition  of  the  POMDP, 
the  agent  must  continue  to  perform  one  of  four  actions  in  A  despite  none  of  them  adding  to  the  total  reward. 
Intuitively,  the  agent  should  just  stop. 

We  have  already  formalized  a  solution  to  this  counterintuitive  result  for  MDPs  using  the  idea  of  non¬ 
redundancy  in  Section  2.1.2.  We  may  apply  the  same  idea  to  POMDPs.  We  add  to  each  POMDP  a  dis¬ 
tinguished  action  stop  that  indicates  that  the  agent  stops  and  does  nothing  more  (for  the  purpose  in  ques¬ 
tion).  The  stop  action  always  produces  zero  reward  and  results  in  no  state  change  (i.e.,  r(s.  stop)  =  0  and 
t(s,  stop)  =  degen (s)  for  all  s  in  5).  The  action  stop  is  always  followed  by  the  dummy  observation  o  (i.e., 
z/(stop,  s)  =  degen (o)  for  all  s  in  S). 

An  action  a  from  a  belief  state  ft  is  redundant  if  it  is  no  better  than  stopping  (i.e.,  if  Q*n(6,  a.)  < 
Qm(/3,  stop)  =  0).  A  strategy  is  non-redundant  if  it  never  requires  a  redundant  action  from  any  belief  state. 

In  Section  2.1.2,  we  converted  the  MDP  model  into  the  NMDP  model  by  considering  strategies  that 
contain  redundant  actions  to  be  suboptimal.  Similarly,  we  can  create  a  non-redundant  POMDP  model 
(NPOMDP).  To  do  so,  we  require  that  the  agent  selects  not  just  any  strategy  from  the  set  of  those  that  maxi¬ 
mizes  the  expected  total  discounted  reward,  but  rather  that  it  selects  only  a  strategy  that  both  maximizes  the 
reward  and  is  non-redundant.  More  formally,  we  define  the  set  of  optimal  strategies  for  an  NPOMDP  m  to 
be  nopt(m)  =  nopt(bmdp(m))  where  nopt  is  defined  for  MDPs  in  Section  2.1.2.  By  Theorem  1,  nopt(m) 
is  not  empty  since  bmdp(m)  is  an  MDP. 

We  define  nbehv(m)  such  that  a  sequence  [fti,  aq,  cq,  /%,  02, . . . ,  ftn,  an,  on\  in  (B  x  A  x  O)*  is  in 

nbehv(m)  if  and  only  if  it  is  a  possible  behavior  of  m  and  \ft\,  01,  ft2, 02,  •  •  •  ftn,  «n]  is  in  nbehv(bmdp(m)). 

Henceforth,  we  will  use  mphy  and  ma dv  to  refer  to  the  NPOMDP  versions  of  the  model  presented  in 
Sections  3.2.3  and  3.2.2,  respectively.  For  example,  mphy  will  now  refer  to  the  NPOMDP  such  that 

•  S  is  as  before:  S  =  {si,S2, ...  ,s  7,  ss}; 

•  A  =  {take,  sendx’i,  sendx2,  diagnose,  stop}; 

•  t  as  before  except  with  domain  of  actions  now  including  stop:  that  is,  as  defined  in  Table  3.1  with  the 
last  line  of  the  table  showing  that  t(s,  stop)  =  degen  (s)  for  all  states  s ; 

•  r  is  as  before  except  with  the  domain  of  actions  now  including  stop: 

r(s3,  diagnose)  =  r(ss,  diagnose)  =  v(sq,  diagnose)  =  12 


39 


and  r(s,  a)  =  0  for  all  other  values  of  s  E  S  and  a  E  A  (including  stop); 

•  O  is  as  before:  O  =  {x\ ,  X2,  o}  (if  O  did  not  already  have  the  dummy  observation  o  as  a  member,  we 
would  had  added  it); 

•  v  is  as  before  except  with  the  set  of  actions  now  including  stop:  ;y(take.  .S3)  =  degen(xi),  intake,  S4)  = 
degen (.X2),  and  iAa.  s')  =  degen(o)  for  all  other  actions  a  (including  stop)  and  states  s' ;  and 

•  7  is  as  before:  7  =  0.9. 

The  requirement  of  non-redundancy  forces  non-redundant  optimal  strategies  a*  to  be  such  that 

a*  (degen  (sj))  =  o-*(degen(sg))  =  stop 
in  addition  to  the  requirements  resulting  from  optimality: 

<7*(/3q)  =  take 

a*  (degen  (S3))  =  a*  (degen  (S5))  =  a*  (degen  (sg))  =  diagnose 

a*  (degen  (54))  =  sendx2 

As  with  our  figures  depicting  NMDPs,  our  figures  of  NPOMDPs  do  not  show  the  do-nothing  action  stop 
that  is  always  a  self-loop  of  zero  reward.  Also  implicit  is  that  the  observation  from  stop  is  always  the  dummy 
observation  o.  Since  the  features  that  distinguish  a  NPOMDP  from  a  POMDP  are  not  represented  in  our 
figures  of  either,  a  single  figure  may  be  reused  to  represent  both  a  POMDP  and  the  corresponding  NPOMDP. 
For  example,  while  we  introduced  Figure  3.2  to  represent  the  POMDP  formalized  in  Section  3.2.3,  it  also 
serves  to  represent  the  NPOMDP  version  described  in  this  section. 


3.3  Modeling  Information  Use 

To  gain  information  is  to  see  a  distinction.  Thus,  to  ignore  information  corresponds  to  ignoring  this  distinc¬ 
tion.  Below  we  formalize  this  idea  using  an  equivalence  relation  that  conflates  information.  We  then  apply 
the  formalization  to  our  two  examples. 

3.3.1  Formal  Model 

To  formalize  the  idea  of  using  or  ignoring  information,  we  use  an  equivalence  relation  =  over  an  observation 
space  O.  For  each  equivalence  class  of  =,  the  agent  will  conflate  its  members  by  treating  every  observation 
in  it  as  indistinguishable  from  one  another.  Let  =  [o]  denote  the  equivalence  class  that  holds  the  observation 
o  (i.e.,  =[o]  =  {  o'  E  O  |  o'  =  o  }). 

To  ignore  these  distinctions,  when  the  agent  observes  o,  it  updates  its  beliefs  as  though  it  has  seen  some 
element  of  =  [o]  but  is  unsure  of  which  one.  That  is,  if  the  agent  starts  with  beliefs  ft  and  observes  o  after  per¬ 
forming  the  action  a,  it  will  develop  the  new  beliefs  (3 ’  where  f3f(s')  =  PrfS^.s7  |  O  E  =[o] ,  /l=a,  B=/3\. 
Let  update^,  )/),  a,  o,  =)  denote  these  updated  beliefs  j3' .  To  show  how  to  compute  updat e[n(/3,  a,  o,  =),  we 
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rewrite  update'  in  terms  of  m  and  =  as  follows: 


update' (/?,  a,  o,  =)(s') 

=  Pr [S'=s'  |  O  €  =[o\,A=a,  B=fJ\ 

_  Pr[0  G  =[o]  |  S'=s',  A=a,  B=/3]  Pr[S'=s'  |  M=a,5=/3] 

“  Pr[0  G  =[o]  |  A=a,B=0\ 

Pr[V0ig=[0]0=oi  |  S'=s' ,  A=a,  B=/3\Pv[S'=s'  \  A=a,  B=0\ 
pr[V0lG=[0]0=oi  |  A=a,B=P\ 

_  E0i6=[0]  pr[0=oi  |  S'=s\A=a,B=P]Pr[S'=s'  \  A=a,B=0] 

“  Zo1e=[o]Pi[0=o1\A=a,B=P\ 

_  Eoig=[o]  pi'[0=ot  I  s'=s',  A=a,  B=/3\  ]Tse5  Pt[S=s  \  A=a,  B=0\  Pr[S"=s'  |  A=a,  B=(3,  S=s] 

Eflie=[0]  pi'[0=0l  |  A=a,  B=f3\ 

_  Eoie=[o]  g/)(°i)  EUs  P(s)  *  t(s’  a)(s') 

Eoie^[o]Pr[0=oi  I  A=a,B=P] 

The  third  line  is  well  defined  since  the  agent  observing  o  after  performing  the  action  a  from  belief  state  P 
implies  that  Pr[0  G  =[o]  |  A=a,  B=P]  >  Pr[0=o  |  A=a,  B=p]  >  0.  We  may  move  from  the  disjunction 
in  the  fourth  line  to  the  summation  in  the  fifth  line  since  the  agent  can  make  only  a  single  observation  after 
each  action  making  the  different  possible  observations  mutually  exclusive  events. 

Looking  at  the  denominator  of  the  last  line,  since 

Pr[0=oi  |  A=a,  B=P] 

o\  £=[o] 

is  independent  of  s',  it  may  be  treated  as  a  normalization  factor  equal  to 

Y  */(a»s,x°)X^(a)  *i(s>a)(s') 

oiE=[o]  s'G«S  s£S 

We  show  how  to  construct  from  a  POMDP  m  and  an  equivalence  relation  =  a  new  POMDP  that  up¬ 
dates  its  beliefs  in  manner  consistent  with  the  modified  updating  function  update^.  To  do  so,  we  alter  the 
observation  space  O  and  observation  distribution  v  of  m.  This  construction  enables  defining  information 
use  with  POMDPs  and  obviates  the  need  to  introduce  a  new  class  of  models  specialized  for  the  modified 
belief  updating  function  update'.  Since  the  construction  only  affects  the  observation  space  and  observation 
distribution,  it  works  identically  on  NPOMDPs  to  produce  an  NPOMDP  that  models  information  use  and 
purpose  restrictions. 

Given  a  POMDP  m  and  an  equivalence  relation  =  on  O,  let  m/=  denote  the  restricted  POMDP  that 
results  from  the  agent  ignoring  the  distinctions  among  observations  related  by  =.  We  define  the  quotient 
POMDP  m/=  by  quotienting  the  observation  space  O  of  m  with  =  so  that  m/=  ignores  information 
while  using  the  standard  update  function.  Given  m  =  (S.A.t.  r.  O.  u.  7).  let  m/=  equal  the  POMDP 
(. S ,  A.  t,  r,  O /=,  ;//=,  7)  where  Of=  is  the  partitioning  of  O  under  =  (i.e.,  O /=  is  the  set  of  equivalence 
classes  of  =)  and  v/={a ,  s'){0)  =  EogO  l/(°’  s/)(°)  where  O  is  an  element  of  0/=  (i.e.,  O  is  an  equiva¬ 
lence  class  of  =). 
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The  following  proposition  (Proposition  4)  verifies  that  m/=  is  a  POMDP  by  showing  that  v /=  satisfies 
the  requirements  of  being  a  probability  distribution. 

Proposition  4.  For  all  POMDPs  rn  and  equivalence  relations  =  over  the  obserx’ation  space  of  m,  m/=  is 
a  POMDP 


Proof.  We  must  prove  that  u/=  is  a  well-defined  probability  distribution  over  the  space  of  observations 

o/=. 


=(o)=  Y  Yv^ 

oeo/=  oeo/=o£0 

=  Y u (°) 

oGO 

=  1 

where  the  second  line  follows  from  0/=  being  a  partition  of  O  and  the  last  line  follows  from  u  being  a 
distribution  over  O. 

For  all  O  €  Of=, 


v/=iP)  =  Yu(°) 

oGO 


As  v  is  a  distribution,  v(o)  >  0  for  all  o  in  O.  Thus,  X^ogO  z/(°)  —  0-  As  O  C  O  and  v  is  a  distribution 
over  O,  EogO  v(p)  <  EogO  K°)  =  L  D 

The  next  proposition  (Proposition  5)  shows  that  m/=  captures  ignoring  information  according  to  the 
definition  of  update'. 

Proposition  5.  For  all  POMDPs  m  and  equivalence  relations  =  over  the  obserx’ation  space  ofm,  update^, 
and  updatem/=  are  equivalent. 

Proof.  For  all  (3,  a,  o,  =,  and  s', 


updat  e'm{/3,a,o,=)(s') 


Eoie=[o]  s0(oi)  EsS,s  P(s)  *  Q)(s0 

E01g=[o]  Es'gS  K®.  s')(o)  Esg5  /?(s)  *  *(s>  «)(s') 
E01g=M  u(xa’  g/)(°l)  EsgS  ft(s)  *  t(s’  a)(g/) 
Es'gS  E01gs[0]  v(a,  s')(°l)  EsgS  /3(S)  *  °)(s') 

is/=(a,  gQ(=[o])  EsgS  ff(s)  *  a)(s0 

Es'gS  s')(=[o])  EsGS^5)  *  *(s>  a)(s') 

updatem/=(/3,a,=[o])(s') 


where  the  last  line  comes  from  the  same  reasoning  found  in  Lines  3.1  to  3.5  applied  in  reverse.  □ 

These  two  propositions  (Propositions  4  and  5)  together  show  that  m/=  is  a  POMDP  with  a  new  obser¬ 
vation  space  0/=  that  ignores  information  conflated  by  =.  Thus,  we  may  model  ignoring  information  using 
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the  POMDP  model  and  do  not  need  to  construct  new  model  based  around  the  modified  updating  function 
update7.  While  using  a  model  m/=,  the  actual  observations  made  by  the  agent  continue  to  lie  in  O,  not 
O /=.  Thus,  to  use  the  model  m/=  as  a  POMDP,  one  must  map  the  observation  o  to  =[o]  before  updating 
the  agent’s  beliefs. 

An  agent  does  not  use  the  information  conveyed  by  the  distinctions  among  the  observations  O  C  O  if 
the  agent  plans  using  a  POMDP  m/=  where  all  the  observations  of  O  are  related  by  =.  Strictly  speaking, 
the  agent  uses  the  information  if  he  plans  with  the  model  m  even  if  the  strategy  that  it  would  choose  under 
m  and  m/=  are  identical. 

Note  that  if  =  relates  every  element  of  O  to  one  another,  this  does  not  imply  that  the  agent  never  learns 
anything.  For  example,  suppose  that  the  agent  can  perform  an  action  a  that  always  leads  to  a  single  state  sa. 
After  performing  this  action,  the  agent  will  learn  with  certainty  that  it  is  in  the  state  sa  even  in  the  absence  of 
any  meaningful  observations.  Thus,  our  formalism  cannot  capture  policies  that  restrict  an  agent  from  using 
information  about  what  actions  it  performed.  However,  we  have  not  seen  any  such  policies  in  practice. 

More  generally,  the  POMDP  model  itself  contains  information  about  the  agent’s  environment  that  the 
agent  will  continue  to  use.  The  auditor  must  ensure  that  the  agent  does  not  construct  a  model  using  prohib¬ 
ited  information.  Typically,  privacy  policies  govern  information  about  specific  individuals  (e.g.,  a  visitor’s 
sex).  The  agents  subjected  to  auditing  typically  handle  many  such  individuals  (e.g.,  our  website  will  show 
advertisements  to  many  visitors).  Thus,  an  agent  typically  constructs  its  model  from  general  information 
(e.g.,  the  ratio  of  the  sexes)  and  uses  observations  to  make  it  parametric  in  the  individual.  Thus,  we  expect 
using  prohibited  information  to  create  a  model  would  be  conspicuous. 

At  the  opposite  extreme,  m/=  is  a  restricted  POMDP  that  behaves  identically  to  m  in  that  =  (i.e., 
equality)  ignores  no  distinctions  between  any  two  observations.  That  is,  for  m/=,  every  observation  o  is 
mapped  to  the  singleton  {o}  in  0/=  meaning  that  it  is  conflated  with  no  other  observations.  This  results  in 
update'(/3,  a,  o,  =)  =  updatem(/3,  a,  =[o])  being  equal  to  updatem(/3,  a,  o). 

3.3.2  Advertising  Example:  Information  Use 

Returning  to  our  running  example  formally  modeled  in  Section  3.2.2,  the  policy  governing  the  website 
states  that  the  website  will  not  use  the  database’s  entry  about  the  visitor’s  sex  for  determining  the  adver¬ 
tisement  to  show  the  visitor.  The  auditor  must  decide  how  to  formally  model  this  restriction.  One  way 
would  be  to  use  the  smallest  equivalence  relation  =adv  such  that  for  all  d\  and  d,2  in  {f,  m,  _L}  and  all  a 
in  {adi,  ado,  ad3,  0},  {d,  a)  =acju  conflating  the  database’s  entry  for  all  observations.  Under  this 

equivalence  relation,  madv/=adv  is  {S,  A,  t,  r,  0/=adv,  ^/=adv,  7)-  CV=adv  holds  five  observations  created 
from  the  observations  of  O  as  the  equivalence  classes  of  =adv-  For  each  value  of  a  in  {adi,  ad2,  ad3,  0}, 
0/= adv  holds  the  observation  {(f,  a),  (m,  a),  (_L,  a)}.  Furthermore,  0/=a dv  holds  {o}  since  o  is  only  re¬ 
lated  to  itself  by  =adv-  For  all  states  (g,d,a),  z//=(lookup,  (g,  d,  a))  =  degen({(f,  a),  (m,  a),  (_L,  a}}). 
For  all  states  s'  and  all  i,  i//=(adj,  s')  =  degen({o}). 

The  website  planning  with  the  model  madv/=adv  does  not  use  any  information  from  database.  In  this 
case,  the  website’s  initial  beliefs  will  solely  determine  its  optimal  strategies.  Furthermore,  under  madv/=adv> 
performing  the  action  lookup  will  provide  no  benefit  to  the  website  since  the  website  will  conflate  the  obser¬ 
vations  to  ignore  the  information  it  provides.  Any  optimal  strategy  for  madv/=adv  will  call  for  performing 
ad2  from  the  initial  beliefs  0o  discussed  in  Section  3.2.2. 

Alternatively,  the  auditor  might  conclude  that  the  policy  only  forces  the  website  to  ignore  whether  the 
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database  records  the  visitor  as  a  female  or  male  and  not  whether  the  database  contains  this  information. 
In  this  case,  the  auditor  would  use  a  different  equivalence  relation  =adv  such  that  (f,  a)  =adv  (m.  a)  but 
(f,  a)  ^adv  (JL,  a)  ^adv  (m,a)  for  all  a  in  {adi,  ad2,  ad3,  0}.  Under  the  initial  beliefs  /3q,  the  website 
would  behave  identically  under  =acjv  and  =adv  However,  if  the  website’s  initial  beliefs  were  such  that  it  is 
much  more  likely  to  not  know  a  female’s  sex  than  a  male’s,  then  it  might  choose  to  show  adi  instead  of  ad2 
in  the  case  of  observing  (_L,  0). 

In  our  opinion,  =adv  more  accurately  reflects  the  policy  that  the  visitor’s  sex  will  not  be  used  for  mar¬ 
keting.  We  feel  that  the  distinction  between  (_L,  a)  and  (f,  a)  is  provides  information  about  the  visitor’s 
presence  in  the  database.  However,  an  auditor  may  interpret  the  privacy  policy  differently  or  be  enforc¬ 
ing  a  privacy  policy  that  clearly  prohibits  the  use  of  any  information  about  the  patient’s  sex  or  willingness 
to  provide  that  information.  In  either  of  these  cases,  the  auditor  may  require  the  auditee  to  treat  all  three 
observations  as  indistinguishable. 

3.3.3  Physician  Example:  Information  Use 

We  return  to  the  previously  discussed  example  formally  modeled  in  Section  3.2.3.  The  naming  of  actions  in 
that  example  conveys  that  the  actions  sendaq  and  send./; 2  each  involves  the  information  of  the  X-ray,  which 
is  represented  as  the  observations  xq  and  x2.  An  informal  inspection  of  the  POMDP  mphy  reveals  that  we 
intuitively  agree  that  these  actions  involve  that  information. 

It  may  be  tempting  to  apply  our  formalism  with  the  goal  of  showing  that  sendaq  and  sendaq  actions  use 
the  information  of  X-rays.  However,  our  formalization  does  not  determine  whether  an  action  uses  informa¬ 
tion.  The  reason  is  that  a  single  action  could  either  use  information  or  not  depending  upon  why  the  agent 
selected  to  perform  that  action.  For  example,  despite  our  intuition  that  the  action  sendaq  uses  the  informa¬ 
tion  of  the  X-ray,  if  the  physician  decided  to  perform  the  action  sendaq  regardless  of  any  observations  he 
made,  then  that  action  does  not  use  the  information  of  the  X-ray.  Thus,  rather  than  determine  whether  an 
action  uses  information,  our  formalism  determines  whether  an  agent  uses  information  based  upon  how  the 
agent  plans. 

Our  formalization  does  show  that  an  agent  performing  the  action  sendaq  because  the  X-ray  had  the 
value  X2  used  that  information.  To  see  this  result,  consider  the  equivalence  relation  =p(iy  that  is  the 
smallest  such  that  aq  =p(iy  aq  (o  is  not  related  to  anything  but  itself).  Under  this  equivalence  relation, 
mphy/=phy  is  (<S,  A,t,r,  0/=phy,  i//=phy,  7).  The  observation  space  0/=phy  is  {{o},  {aq,  aq}}.  i7=phy 
is  such  that  z//=phy(take,  S3)  and  z//=phy(take,  S4)  are  each  equal  to  degen({aq,  aq});  and  tV=phy(°j  s')  = 
degen ({o})  for  all  other  actions  a  and  states  s'. 

Recall  the  initial  beliefs  0o  discussed  in  Section  3.2.3  such  that  /3q(si)  =  0.9  and  /3q(s2)  =  0.1.  Under 
the  model  mphy/=phy  that  prevents  using  the  X-ray,  the  physician  will  not  learn  what  state  he  is  in  after 
taking  the  X-ray.  Thus,  his  beliefs  about  his  current  state  will  be  determined  by  his  initial  beliefs.  Assuming 
the  initial  beliefs  /3q,  after  taking  the  X-ray,  the  physician  will  first  attempt  to  make  a  diagnosis  himself  since 
S3  is  significantly  more  likely  than  S4.  However,  in  the  case  where  S4  is  actually  the  current  state,  diagnosis 
will  not  work  and  the  physician  will  receive  no  reward.  To  account  for  this  case,  the  physician  will  then 
send  the  X-ray  aq  to  the  specialist  and  then  try  to  make  a  diagnosis  again. 

Note  that  in  this  case,  the  physician  sends  the  X-ray  aq  to  the  specialist  regardless  of  the  taken  X- 
ray’s  value.  Thus,  the  physician  does  not  use  the  taken  X-ray  despite  performing  an  action  involving  an 
X-ray.  This  result  is  a  kin  to  the  fact  that  a  tabloid  asserting  that  a  celebrity  has  a  disease  based  upon  no 
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information  about  the  celebrity  cannot  be  accused  of  using  medical  information  about  the  celebrity  even 
when  the  tabloid’s  assertion  is  correct  by  luck.  (While  the  tabloid  did  not  violate  the  celebrity’s  privacy 
rights  by  using  protected  medical  information,  the  tabloid  may  still  have  violated  the  celebrity’s  rights  for 
other  reasons.)  This  result  may  appear  odd  in  the  context  of  our  example  involving  X-rays  for  two  reasons. 
First,  the  model  is  an  abstraction  of  the  example  compressing  many  possible  X-rays  into  just  two  values 
(x\  and  X‘2 )  based  solely  upon  whether  the  physician  may  reach  a  diagnosis  from  it.  In  a  less  abstract 
model  of  the  example,  many  different  values  would  exist  for  the  X-ray  and  physician  would  only  be  able 
to  reach  a  diagnosis  by  sending  the  correct  one  to  the  specialist.  Second,  the  example  (originally  from 
Chapter  2)  presupposes  that  the  physician  may  use  the  X-ray  to  reach  a  diagnosis.  Thus,  we  should  expect 
the  quotient  POMDP  mp hy / =phy  to  be  counterintuitive.  For  this  example,  the  real  value  of  creating  quotient 
POMDPs  lays  in  modeling  purposes  other  than  diagnosis  for  which  physician  may  not  use  the  X-ray.  Thus, 
we  discontinue  this  example  until  discussing  auditing  for  other  disallowed  purposes  in  Section  3.4.5. 


3.4  Auditing 

In  the  previous  section,  we  presented  the  quotienting  operator  ■  /  •  for  restricting  the  information  used  in  a 
POMDP.  This  operation  works  identically  for  non-redundant  POMDPs  (NPOMDPs).  Thus,  we  may  use 
information  quotienting  for  auditing  purpose  restrictions. 

Most  policies  do  not  categorically  rule  out  using  a  type  of  information,  but  rather  restricts  the  purposes 
for  which  the  auditee  may  use  certain  types  of  information.  In  this  case,  the  auditor  can  construct  an 
NPOMDP  m  that  models  optimizing  the  satisfaction  of  the  purpose  in  question.  The  auditor  must  then 
construct  m/=  where  =  is  an  equivalence  relation  constructed  from  the  restrictions  the  policy  puts  on 
information  usage.  The  auditor  may  then  check  whether  optimizing  m/=  can  justify  the  auditee’s  actions. 

However,  auditing  for  disallowed  uses  of  information  is  more  complex  than  auditing  for  disallowed  ac¬ 
tions.  Whereas  actions  are  observable  and  in  many  contexts  may  be  recorded  in  a  log  file,  when  information 
is  only  used  in  the  planning  process  itself,  it  is  not  directly  observable.  Thus,  the  auditor  must  infer  from  the 
actions  of  the  auditee  what  information  it  used.  We  consider  this  process  for  both  prohibitive  and  exclusivity 
rules  before  discussing  automating  the  process  in  Section  3.5. 

3.4.1  Prohibitive  Rules 

Consider  a  rule  of  a  privacy  policy  that  demands  that  some  information  is  not  used  for  a  certain  purpose  p  (a 
prohibitive  rule).  The  auditee  must  treat  observations  that  only  differ  in  that  information  as  the  same  while 
planning  for  that  purpose.  Thus,  if  the  auditee  is  planning  for  the  purpose  p  and  m  is  a  POMDP  modeling 
the  auditee’s  environment  whose  reward  function  measures  the  satisfaction  of  p,  then  the  auditor  may  expect 
that  the  auditee’s  actions  as  recorded  in  the  log  are  consistent  with  a  strategy  in  nopt(m/=)  where  =  relates 
each  observation  to  all  the  other  observations  that  differ  only  by  the  restricted  information.  If  the  auditee’s 
actions  are  inconsistent  with  every  strategy  in  nopt(m/=),  then  the  auditor  knows  that  auditee  performed 
one  or  more  of  the  following  acts: 

1 .  performed  actions  for  some  purpose  other  than  p, 

2.  used  the  prohibited  information, 
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3.  failed  to  properly  optimize  m/=  despite  trying  (is  incompetent),  or 

4.  used  some  model  other  m. 

Each  of  these  acts  warrant  the  auditor’s  attention. 

The  auditor  may  further  check  whether  the  auditee’s  actions  are  consistent  with  nopt(m).  If  the  actions 
are  inconsistent  with  both  nopt(m)  and  nopt(m/=),  then  the  auditee  either  planned  for  a  different  purpose, 
was  incompetent,  or  used  a  different  model.  If  the  auditee’s  actions  are  consistent  with  nopt(m),  but  not 
with  nopt(m/=),  then  the  auditor  receives  a  strong  suggestion  that  the  auditee  made  use  of  the  prohibited 
information  while  planning  for  the  purpose  p.  However,  it  does  not  prove  that  the  auditee  used  the  infor¬ 
mation  for  the  purpose  p  since  it  could  also  be  the  case  that  the  auditee’s  actions  result  from  the  auditee 
following  a  strategy  for  some  other  purpose. 

3.4.2  Advertising  Example:  Auditing  for  a  Prohibitive  Rule 

Consider  the  example  modeled  in  Section  3.2.2  and  discussed  in  Section  3.3.2  in  which  the  website  is 
prohibited  from  using  the  visitor’s  sex  for  the  purpose  of  marketing.  Let  the  initial  beliefs  /5q  be  as  before: 
/3o((f,f,0))  =  0.4,  /3o((f,_L,0))  =  0.1,  /3o((m,m,0))  =  0.4,  and  /30((m,  _L,  0))  =  0.1. 

Suppose  that  the  log  shows  \(3q,  lookup,  (f,  0),  /3f ,  adi]  where  j3 f  is  as  defined  in  Section  3.2.2:  6f  = 
degen((f,  f,  0)).  In  this  case,  the  auditor  can  easily  tell  that  the  website  used  the  prohibited  informa¬ 
tion  since  the  intermediate  belief  state  /3f  is  equal  to  update,„^(/3o,  lookup,  (f.  0))  but  is  not  equal  to 
updatemadv/=/d  (@o,  lookup,  { (f ,  0),  (m,  0)})  where  the  observation  {(f,  0),  (m,  0)}  is  the  equivalence 
class  of  (f,  0)  under  ='adv. 

While  not  represented  in  the  formal  model  madv  we  constructed  for  this  example,  in  many  contexts  the 
auditor  may  be  comfortable  with  the  assumption  that  the  only  purpose  for  which  the  auditee  could  have 
performed  the  action  adi  is  advertising.  Under  this  assumption,  the  auditor  may  determine  that  either  the 
website  used  the  information  for  a  disallowed  purpose  or  is  incompetent.  Thus,  while  the  auditor  cannot 
rule  out  with  certainty  the  possibility  that  some  other  purpose  led  to  the  auditee’s  actions,  the  auditor  may 
do  so  convincingly. 

The  above  example  used  the  intermediate  beliefs  of  the  website.  For  some  systems  modeled  as  an  MDP, 
obtaining  the  intermediate  states  in  a  log  seems  plausible.  However,  the  intermediate  states  under  a  belief 
MDP  correspond  to  the  auditee’s  subjective  beliefs.  Obtaining  this  information  without  asking  the  auditee, 
who  could  lie,  seems  difficult  if  not  impossible.  However,  even  without  access  to  /5f ,  the  auditor  may  reach 
the  same  conclusion  provided  he  knows  /3q.  That  is,  suppose  that  the  log  just  records  that  website  started 
with  initial  beliefs  (3q,  performed  lookup,  and  then  performed  adi.  The  auditor  may  determine  the  website’s 
actions  are  consistent  with  nopt(madv)  but  not  nopt(madv/='dv)  since  performing  the  action  ad2  but  not 
adi  is  optimal  for  madv/=adv  under  the  initial  beliefs  Tq.  In  fact,  the  auditor  does  not  even  need  to  know  that 
the  website  performed  lookup  since  performing  adi  implies  that  the  website  did  so:  without  the  information 
made  available  by  lookup,  the  action  ad2  optimizes  madv  given  the  initial  beliefs  Bq.  In  this  example,  having 
access  to  the  initial  beliefs  Bo  seems  conceivable  if  common  knowledge  includes  that  females  and  males  are 
equally  likely,  the  database  is  correct  and  contains  one  80%  of  the  visitors,  and  the  visitor  has  not  already 
seen  an  advertisement. 

Without  access  to  the  initial  beliefs  of  the  website,  the  auditor  cannot  conclude  from  just  the  fact  that 
the  website  performed  the  action  adi  that  the  website  used  the  database’s  entry  on  the  visitor’s  sex  for 
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marketing.  The  website  might  have  started  with  the  initial  belief  that  all  visitors  are  female.  In  this  case, 
the  website  would  optimally  show  adi  without  checking  the  database.  While  this  uses  the  website’s  beliefs 
about  the  visitor’s  sex,  it  does  not  use  the  restricted  information,  the  database. 

3.4.3  Exclusivity  Rules 

Consider  a  rule  of  a  privacy  policy  that  demands  that  some  information  is  used  only  for  a  certain  purpose  p 
(an  exclusivity  rule).  Given  the  logged  behaviors  of  the  auditee,  the  auditor  must  first  determine  whether  they 
could  be  for  the  purpose  p.  That  is,  the  auditor  must  determine  whether  they  are  consistent  with  nopt(m) 
where  m  is  a  model  for  optimizing  the  purpose  p.  If  they  could  be  for  the  purpose  p,  the  audit  finishes 
without  finding  any  violation. 

If  the  sequence  of  actions  performed  by  the  auditee  cannot  be  for  the  purpose  p,  then  the  auditor  must  de¬ 
termine  whether  the  auditee  used  the  restricted  information  in  selecting  the  sequence  of  actions.  This  process 
is  similar  to  the  process  used  for  auditing  prohibitive  rules.  However,  now  the  auditor  must  find  a  second 
purpose  p'  that  can  explain  the  actions.  If  the  actions  are  consistent  with  nopt(m')  but  not  nopt (m'/=) 
where  m!  is  a  model  for  the  purpose  //  and  =  relates  observations  that  only  differ  in  the  restricted  infor¬ 
mation,  then  the  auditor  may  conclude  that  if  the  auditee  performed  the  actions  for  the  purpose  p' ,  then  it 
violated  the  policy. 

The  auditor  may  repeat  this  process  for  every  purpose  p'  of  which  the  auditor  can  conceive.  For  each  such 
alternative  purpose  p'  the  auditor  may  test  whether  the  actions  of  the  auditee  are  consistent  with  optimizing 
m' /=  where  in'  is  a  POMDP  for  the  purpose  //.  If  the  auditor  finds  an  alternative  p1  such  that  the  auditee’s 
actions  are  consistent  with  optimizing  m' /=,  then  the  auditor  has  found  an  innocent  explanation  for  the 
auditee’s  behavior.  If  the  auditor  can  find  no  such  purpose,  then  the  auditor  may  be  fairly  confident  that 
the  auditee  violated  the  policy.  However,  the  auditor  cannot  be  absolutely  certain  since  the  auditor  might 
have  failed  to  conceive  of  an  exculpatory  purpose  or  the  auditee  could  be  incompetent  and  failed  to  correctly 
optimize  for  a  purpose  while  using  only  allowed  information. 

Alternatively,  the  auditor  may  ask  the  auditee  to  supply  the  alternative  purpose  p'  and  the  auditor  may 
check  only  that  purpose.  If  the  auditee  cannot  provide  an  alternative  purpose  p'  such  that  its  actions  are 
consistent  with  optimizing  p'  without  using  the  restricted  information,  the  auditor  may  conclude  that  the 
auditee  either  violated  the  policy  or  is  incompetent. 

3.4.4  Advertising  Example:  Auditing  for  an  Exclusivity  Rule 

Again  returning  to  the  example  formalized  as  ma dv>  suppose  that  the  policy  stated  that  a  visitor’s  sex  may 
only  be  used  for  the  purpose  of  identification.  Suppose  that  the  website  performed  the  action  lookup  fol¬ 
lowed  by  adi.  Under  the  assumption  that  the  website  would  only  perform  the  action  adi  for  the  purpose 
of  advertising,  the  auditor  may  focus  on  advertising  as  the  single  alternative  purpose.  As  explained  in  the 
example  for  prohibitive  rules  (Section  3.4.2),  the  auditor  may  conclude  that  the  auditee  used  the  patient’s 
sex  for  the  purpose  of  marketing  indicating  a  violation  of  this  exclusivity  rule. 

3.4.5  Physician  Example:  Auditing  for  an  Exclusivity  Rule 

Let  us  return  to  the  physician  example  formalized  in  Section  3.2.3.  In  this  example,  the  physician  is  allowed 
to  use  the  X-ray  only  for  diagnosis.  The  physician  starts  with  initial  beliefs  /3q  such  that  /7q(.S| )  =  0.9, 


47 


f3o(s 2)  =  0.1,  and  Po(s)  =  0  for  all  other  states  s. 

Suppose  the  auditor  has  a  log  that  shows  that  the  physician  starts  with  the  initial  beliefs  (3q,  performs  that 
action  take,  observes  x\,  and  performs  send;/-] .  Despite  not  recording  the  intermediate  beliefs  of  the  physi¬ 
cian,  this  log  shows  that  the  physician’s  actions  were  not  for  the  purpose  of  diagnosis  since  the  physician 
could  make  the  diagnosis  without  sending  the  X-ray  to  the  specialist. 

Since  these  actions  are  not  for  the  purpose  of  the  diagnosis,  the  physician  is  prohibited  from  using  the 
X-ray  while  selecting  these  actions.  If  the  physician  obeyed  the  policy,  there  must  exist  another  purpose  p' 
such  that  these  actions  are  optimal  for  m' /=p hy  where  m'  is  a  model  for  satisfying  7/  and  =phy  relates  the 
two  X-rays.  The  existence  of  such  a  purpose  may  strike  the  auditor  as  highly  suspect  since  the  primary  effect 
of  the  action  take  is  to  produce  information  that  the  physician  must  ignore.  Furthermore,  that  the  physician 
observes  x\  and  then  performs  the  corresponding  action  send:/;]  is  a  striking  coincidence. 

The  auditor  may  decided  to  interview  the  physician  to  determine  what  purpose  he  might  have  had  in 
mind.  The  physician  could  offer  purposes  that  would  explain  the  actions  without  using  the  X-ray.  For 
example,  the  alternative  purpose  of  increasing  costs  is  served  by  taking  an  unused  X-ray  and  sending  some 
X-ray  (not  necessarily  the  correct  one)  to  a  specialist.  That  the  X-ray  taken  and  sent  could  be  the  same  by 
coincidence  is  possible  in  this  example.  (In  a  more  realistic  example  with  many  more  possible  X-rays,  it 
becomes  significantly  less  plausible.)  However,  while  not  previously  discussed,  the  physician  is  likely  to  be 
governed  by  additional  policies  that  would  make  such  a  purpose  illegitimate. 

If  the  physician  cannot  produce  a  legitimate  purpose  for  which  his  actions  are  optimal  and  non-redundant, 
then  the  auditor  has  found  that  the  physician  committed  a  violation.  The  nature  of  the  physician’s  violation 
could  take  one  of  two  forms.  First,  it  might  have  been  violation  of  the  prohibition  against  using  the  X-ray 
for  a  purpose  other  than  diagnosis.  Second,  the  physician  might  have  performed  an  action  for  an  otherwise 
illegitimate  purpose. 


3.5  Auditing  Algorithm 

In  this  section,  we  provide  an  algorithm  that  determines  whether  a  behavior  could  have  resulted  from  op¬ 
timizing  a  POMDP  modeling  a  purpose  and  quotiented  by  an  equivalence  relation  modeling  a  restricted 
class  of  information.  The  above  examples  (Sections  3.4.2,  3.4.4,  and  3.4.5)  illustrate  not  only  that  such  an 
algorithm  can  aid  an  auditor,  but  also  that  the  auditor  must  make  numerous  other  determinations.  For  exam¬ 
ple,  the  auditor  must  also  determine  whether  a  purpose  is  illegitimate  for  an  auditee  given  all  the  purpose 
restrictions  of  a  policy.  We  leave  automating  these  other  determinations  to  future  work. 

Furthermore,  while  the  above  examples  illustrate  cases  where  the  audit  may  make  determinations  with¬ 
out  access  to  the  auditee’s  beliefs,  we  focus  on  the  case  where  the  auditor  interviews  the  auditee  to  determine 
its  (purported)  belief  states.  The  auditor  then  checks  whether  the  auditee’s  story  is  consistent  with  itself  and 
with  any  logs  that  the  auditor  has.  Our  algorithm  aids  the  auditor  in  determining  whether  the  auditee’s  story 
is  consistent. 

Performing  auditing  in  this  fashion  must  be  more  focused  than  auditing  using  the  algorithm  of  Sec¬ 
tion  2.3.  The  degree  of  automation  of  that  algorithm  allows  the  auditor  to  run  it  looking  suspicious  actions 
for  every  auditee.  The  costs  of  interviewing  auditees  may  prohibit  such  routine  auditing  in  the  case  of 
POMDPs.  Rather,  our  approach  for  auditing  POMDPs  is  better  restricted  to  investigating  auditees  found  to 
be  suspicious  through  other  means. 
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AuditNPOMDPapprox  (m  =  (S,  A,t,r,  O,  177),  =,  b  =  [fa,  a\,  01,  /%,  a2,  o2, . . .  ,/3n,  an,on]): 

31  mf  =  (S,A,t,r,0/=,v/=,  7) 

32  if  (lMPOSSlBLEPOMDP(m',  6)) 

33  return  true  //  behavior  impossible  for  NPOMDP  m! 

34  (Vj*ow,V*P)  ;=  SOLVEPOMDPAPPROx(m') 

35  for  (z  :=  1;  i  <  n;  i++): 

36  if  (Q*(m/,  V*p, /3j,  aj)  <  V*ow(/3j)): 

37  return  true  //  action  suboptimal 

38  if  (Q*(m/,  V*p,  /%,  a*)  <  0  and  a,  7^  stop): 

39  return  true  //  action  redundant 

40  return  false 


Figure  3.4:  The  algorithm  AuditNPOMDPapprox.  Q*  is  defined  in  text. 


While  we  provide  both  an  exact  and  an  approximation  algorithm  for  MDPs,  we  only  provide  an  approx¬ 
imation  algorithm  for  POMDPs  since  exactly  solving  POMDPs  is  undecidable  [MadOO]  (conjectured  earlier 
in  [PT87]).  Figure  3.4  shows  our  algorithm,  called  AuditNPOMDPapprox.  The  core  of  the  algorithm  is 
similar  to  AuditNMDPapprox  in  that  it  also  checks  whether  each  action  that  auditee  performed  is  optimal 
for  the  state  from  which  the  auditee  performed  the  action.  This  core  of  the  algorithm  is  also  closely  related 
to  goal  inference  algorithms  that  use  POMDPs  [BST11,  RG11].  (See  Section  7.4  for  a  detailed  discussion.) 
However,  AuditNPOMDPapprox  differs  from  these  algorithms  by  considering  information  use. 

The  algorithm  takes  as  inputs  a  POMDP  m,  an  equivalence  relation  =,  and  a  log  that  records  a  behavior 
b  =  [fa,  ai,  01,  fa,  cl2,  02,  ■  ■  ■ ,  fa,  an,  on\  such  that  the  audited  agent  is  operating  in  the  environment  rn 
under  a  policy  prohibiting  information  as  described  by  =  and  took  action  at  from  belief  state  bt  for  all 
%  <  n.  AuditNPOMDPapprox  returns  whether  the  agent’s  behavior,  as  recorded  in  b,  is  inconsistent  with 
optimizing  the  POMDP  m/=. 

The  inputs  to  AuditNPOMDPapprox  may  either  be  created  by  the  auditor  based  upon  his  examination 
of  the  auditee’s  behavior  or  constructed  by  the  auditee  and  provided  to  the  auditor.  In  the  second  case,  the 
auditor  must  be  mindful  that  the  auditee  might  provide  false  information.  However,  even  in  this  case,  the 
algorithm  can  still  help  auditor  determine  whether  the  auditee’s  explanation  is  consistent. 

AuditNPOMDPapprox  operates  by  first  constructing  the  quotient  POMDP  m'  =  m/=  from  m  and 
=  using  its  definition.  Second,  it  uses  the  sub-routine  lMPOSSlBLEPOMDP(m/,  b),  shown  in  Figure  3.5,  to 
check  whether  the  given  behavior  b  is  possible  for  m! .  For  each  i,  AuditNPOMDPapprox  then  checks 
whether  performing  the  recorded  action  ax  in  belief  state  is  optimal  under  m/=.  We  use  an  approximation 
algorithm  to  solve  for  the  value  of  performing  ah  in  dt  (i.e.,  Q^/=(/3j,  a*))  and  the  optimal  value  V*n,(Pi). 
For  soundness,  we  require  an  approximation  algorithm  SolvePOMDPapprox  that  produces  both  lower 
bounds  V|ow  and  upper  bounds  V*p  on  /_(/%).  Many  such  algorithms  exist  (e.g.,  [Son78,  KLC98,  ZH01, 
SS05,  KHL08,  PKK11]).  For  each  6%  and  a,  in  l,  AuditNPOMDPapprox  checks  whether  these  bounds 
show  that  )  is  strictly  less  than  V^_(/3j).  If  so,  then  the  action  a,  is  sub-optimal  for  fa  and 

AuditNPOMDPapprox  returns  true. 

Q*(W,  V*p,  fa  a)  is  a  function  that  uses  V*p  to  return  an  upper  bound  on  Q^/=(fa  a).  Q*(W,  V*p,  fa  a) 
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51  for  (i  :=  1;  i  <  n;  i++): 

52  if  (ft  i  Dist(S)): 

53  return  true  //  ft  is  not  a  belief  state 

54  if  (a,;  ^  ft): 

55  return  true  //  a*  is  not  an  action 

56  if  (0i  i  O): 

57  return  true  //  o,  is  not  an  observation 

58  for  (i  :=  1;  z  <  n;  z++): 

59  if  (update, n (ft,  aj,  oft  =  ft+i): 

60  return  true  //  ft+i  unreachable  from  ft  under  a,  and  o,; 

61  for  (j  :=  i  +  1;  j  <  n;  j++): 

62  if  (ft  =  ft  and  a,  ft  aft 

63  return  true  //  no  stationary  strategy  could  have  produced  the  behavior 

64  return  false 

Figure  3.5:  The  algorithm  ImpossiblePOMDP.  Returns  whether  the  given  behavior  is  possible  for  the  given 


POMDP. 


equals: 


OeO/= 


with  Rmt  and  Nmi  as  defined  in  Equations  3.7  and  3.8  of  Section  3.2. 

3.5.1  Correctness 

As  proved  in  Theorem  4,  AuditNPOMDPapprox  is  sound:  if  AuditNPOMDPapprox  (m,  =,  b )  returns 
true,  then  the  behavior  b  is  a  not  a  non-redundant  and  optimal  behavior  for  the  NPOMDP  m/=.  If  the 
auditor  provides  to  AuditNPOMDPapprox  a  model  m  that  accurately  describes  the  environment  of  an 
auditee,  an  equivalence  relation  =  that  accurately  characterizes  the  information  restrictions  of  a  policy,  and  a 
behavior  b  that  accurately  describes  the  behavior  of  the  auditee,  then  to  AuditNPOMDPapprox  (m,  =,  b) 
returning  true  implies  that  the  auditee  deviated  from  behavior  acceptable  under  the  policy.  This  deviation 
could  either  be  the  auditee  optimizing  some  other  purpose,  using  information  it  should  not  have,  using  a 
different  POMDP  model  of  its  environment,  or  failing  to  correctly  optimize  the  POMDP.  Each  of  these 
possibilities  should  concern  the  auditor  and  is  worthy  of  further  investigation. 

The  proof  of  correctness  follows  the  same  outline  as  the  proof  for  the  MDP  approximation  algorithm. 
First,  we  show  local  conditions  for  testing  the  global  property  of  a  behavior  being  non-redundant  and  optimal 
(Lemma  5).  Second,  we  reason  about  the  code  to  show  that  it  correctly  checks  these  local  conditions 
(Lemma  6  and  Theorem  4). 

The  following  lemma  (Lemma  5)  shows  that  a  behavior  b  being  optimal  and  non-redundant  for  a 
NPOMDP  m  (i.e.,  that  b  is  in  nbehv(m))  implies  the  following  three  properties  about  b.  First,  b  must 
actually  be  a  possible  behavior  of  m.  Second,  every  action  a,  in  b  must  be  optimal  for  the  belief  state  ft 
in  which  the  auditee  performed  the  action  a,  (i.e.,  Qft(ft,  aft  =  Vft (ft)j.  Third,  every  action  a,  in  b  must 
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be  non-redundant  in  that  it  must  produce  a  higher  expected  total  discounted  reward  when  taken  in  the  belief 
state  Pi  in  which  the  auditee  performed  it  than  the  action  stop  when  performed  from  Pi  (i.e.,  a,  /  stop 
implies  that  Q*m(Pi,af)  >  Q*m(Pi,  stop)  =  0). 

Lemma  5.  For  all  NPOMDPs  m,  if  the  behavior  b  =  [Pi,  a\,  o\, . . . ,  Pn,  an,  on]  is  in  nbehv(m),  then  b  is  a 
possible  behavior  ofm,  and  for  all  i  <  n,  Q^iPiiaf)  =  Vff(Pf)  and  ai  f  stop  implies  that  Q^n(Pi,af)  >  0. 

Proof  Suppose  that  b  is  in  nbehv(m).  By  the  definition  of  nbehv(m),  b  is  a  possible  behavior  of  m  and 
b'  =  [/?i,  a\,  P2,  a2, . . .  Pn,  o,n\  is  in  nbehv(bmdp(m)).  By  the  dehnition  of  possible  for  POMDPs,  b'  is 
possible  behavior  of  bmdp(m)  and  for  all  i  <  n,  Pi+\  =  updatem(/?j,  ai,of). 

By  Lemma  3,  b'  being  in  nbehv(bmdp(m))  implies  that  b'  is  a  possible  behavior  of  bmdp(m),  and  for 
all  i  <  n,  q*hmMm)  (, Pi ,af)  =  u£mdp(m)  (Pi)  and  at  f  stop  implies  that  q*hmdp(m)  (Pi,  af)  >  0. 

By  Proposition  3,  for  all  i  <  n,  v*bmd p(m)(A)  =  K*M)  and  7bmdP(m)  (A,  a*)  =  Qm(Pi,Oi)-  Thus> 
for  all  i  <  n,  Q*m(Pi,ai )  =  Vfiffi)  since  q*bmdp{m)(Pi,  of)  =  mdP(m)0®*)-  Furthermore,  for  all  i  <  n, 
a,i  f  stop  implies  that  Q*m(Pi,  af)  >  0  since  for  all  i  <n,ai^f  stop  implies  that  g^mdP(m)(^’ a*)  >  0-  Q 

AuditNPOMDPapprox  checks  each  of  the  three  conditions  that  Lemma  5  shows  are  implied  by  a 
behavior  being  optimal  and  non-redundant.  If  any  of  them  are  false,  then  the  behavior  is  not  optimal  and 
non-redundant  by  the  contrapositive  of  Lemma  5.  The  following  lemma  (Lemma  6)  shows  that  the  sub¬ 
routine  lMPOSSlBLEPOMDP(m,  b )  correctly  checks  whether  the  behavior  b  is  possible  for  the  NPOMDP 
m. 

Lemma  6.  For  all  POMDPs  m  and  finite  sequences  b,  if  I M  PO  S  S I  It  L I  :  P  O  M  D  P  ( rn ,  b)  returns  true,  then  b 
is  not  a  possible  behavior  ofm. 

Proof  b  must  have  the  form  [P\,  ai,  o\.  p2,u/>,  02, . . . ,  Pn,  an ,  on\  for  some  n.  If  ImpossibleMDP  returns 
true,  then  at  least  one  of  the  following  is  true:  (1)  there  exists  i  <  n  such  that  Pi  is  not  a  belief  state  of  m 
(Line  53),  (2)  there  exists  i  <  n  such  that  ai  is  not  an  action  of  m  (Line  55),  (3)  there  exists  i  <  n  such 
that  Oi  is  not  an  observation  of  m  (Line  57),  (4)  there  exists  i  <  n  such  that  update,,,  )/^,  an ,  of)  f  Pi+\ 
(Line  60),  or  (5)  there  exists  i  <  n  and  j  where  i  <  j  <  n  such  that  s,  =  Sj  and  ai  f  aj  (Line  63). 
Conditions  (1),  (2),  (3)  each  imply  that  b  is  not  in  (5  x  A  x  O)*,  which  implies  that  b  is  not  a  possible 
behavior  for  m.  Condition  (4)  implies  that  b  is  not  a  possible  behavior  of  m  as  well. 

Let  b'  =  [Pi,ai,  P2,a2,  ■  ■  ■  Pn,  «n] ■  If  Condition  (5)  holds,  there  exists  i  <  n  and  j  where  i  <  j  <  n 
such  that  Si  =  sj  and  ai  f  aj.  This  implies  that  b'  is  not  a  possible  behavior  of  bmdp(m)  by  Lemma  1. 
Thus,  b  is  not  a  possible  behavior  of  m.  □ 

The  soundness  theorem  below  (Theorem  4)  reasons  about  the  code  of  AuditNPOMDPapprox  to 
show  that  it  correctly  checks  the  location  conditions  mentioned  by  Lemma  5.  It  uses  Lemma  6  to  justify  the 
sub-routine  ImpossiblePOMDP. 

Theorem  4  (Soundness).  For  all  POMDPs  m,  equivalence  relations  =  over  the  observation  space  of  m, 
and  finite  sequences  b,  if  AUDITNPOMDPAPPROX  (m,  =,  b)  returns  true,  then  b  is  not  in  nbehv(m/=). 

Proof  b  must  have  the  form  [fi\,  01, 01,  P2,  a,2, 02, . . . ,  pn,  an,  on]  for  some  n.  If  AuditNPOMDPapprox 
returns  true,  then  at  least  one  of  the  following  conditions  is  true:  (1)  ImpossiblePOMDP  (m' ,  b)  returns 
true  (Line  33),  (2)  there  exists  i  <  n  such  that  V*p,  st ,  af)  <  V|ow(.s,)  (Line  37),  or  (3)  there  exists 
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i  <  n  such  that  Q*(m/,  V*p,  Sj,  a*)  <  0  and  a,  ^  stop  (Line  39).  If  (1)  is  true,  then  b  is  not  a  possible 
behavior  of  m!  by  Lemma  6.  If  (2)  is  true,  then  for  that  i,  Q^,(sj,aj)  /  1 0(s«)  since  — 

Q*(m',  V*p,  Si,  ai)  <  V|iow(sj)  <  If  (3)  is  true,  then  for  that  i,  a,  /  stop  does  not  imply  that 

Q*m,{si:ai)  >  0  since  a;  /  stop  and  Q*m,(si,ai)  <  Q*(m',  V*p,  Si,  at)  <  0.  Thus,  under  each  of  these 
cases,  Lemma  5  shows  that  b  is  not  in  nbehv(m/).  This  fact  implies  that  log_1(6)  n  nbehv(m/=)  is  empty 
since  log_1(6)  =  {6}  and  m!  =  m/=.  □ 

Thus,  if  AuditNPOMDPapprox  returns  true,  either  the  agent  optimized  some  other  purpose,  used 
information  it  should  not  have,  used  a  different  POMDP  model  of  its  environment,  or  failed  to  correctly 
optimize  the  POMDP. 

If  the  algorithm  returns  false,  then  auditor  cannot  find  the  agent’s  behavior  inconsistent  with  an  optimal 
strategy  and  the  auditor  should  spend  his  time  auditing  other  agents.  However,  AuditNPOMDPapprox  is 
incomplete  and  such  a  finding  does  not  mean  that  the  agent  surely  obeyed  the  policy.  For  one,  a  better  ap¬ 
proximation  of  might  actually  show  that  Q*m  i=((3i,  ai)  <  V*i,(Pi)  for  some  i.  More  fundamentally, 
incompleteness  remains  even  with  an  exact  POMDP  solver:  it  is  possible  that  the  agent  was  planning  with  a 
different  purpose  in  mind  or  that  the  agent  used  disallowed  information,  but  that,  by  coincidence,  the  agent 
performed  actions  consistent  with  the  allowed  purpose  and  information  use.  While  the  auditor  might  want  to 
find  such  illicit  motivations,  the  agent  has  tenable  deniability  and  the  auditor  cannot  determine  whether  the 
agent  obeyed  the  policy.  Given  the  impossibility  of  such  a  determination  and  that  the  behavior  is  consistent 
with  allowed  behavior  (the  agent  did  the  right  thing  for  the  wrong  reasons),  the  auditor’s  time  is  better  spent 
elsewhere. 
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Chapter  4 


Application  to  Medical  Records 


4.1  The  Healthcare  Domain 

In  this  chapter,  we  apply  our  work  to  the  healthcare  domain.  The  move  to  electronic  health  records  both 
introduces  new  threats  to  the  privacy  of  patients  and  allows  old  ones  to  be  addressed  more  completely  than 
before.  Some  of  these  threats  are: 

1.  curious  employees  looking  up  celebrity  records, 

2.  curious  employees  looking  up  the  records  of  people  they  know, 

3.  massive  data  loss  from  an  insider  copying  records, 

4.  massive  data  loss  from  accidents  such  as  losing  a  laptop, 

5.  massive  data  loss  from  outsiders  attacking  the  system, 

6.  healthcare  providers  systematically  taking  liberties  with  privacy  for  profit  by  taking  advantage  of 
vague  laws, 

7.  honest  employees  inadvertently  violating  privacy  by  accident,  and 

8.  honest  employees  not  sharing  needed  information  despite  being  allowed  to. 

The  threats  involving  massive  data  loss  are  enabled  by  the  move  to  electronic  health  records  as  paper 
records  are  impractical  to  steal  or  copy  in  large  qualities.  Threat  6  is  acerbated  by  electronic  health  records 
(EHRs)  as  they  make  new  uses  of  information  profitable  such  as  data  mining  for  corporate  research.  We 
hope  the  improved  auditing  abilities  of  EHR  will  reduce  the  risks  of  the  remaining  ones. 

The  first  three  involve  purpose  based  violations.  The  next  two  appear  to  be  standard  security  problems 
and,  thus,  we  do  not  consider  them  further.  The  antepenultimate  one  could  be  a  purpose  violation  depending 
upon  one’s  interpretation  of  the  law.  The  last  two  often  involve  purposes. 

Our  formalism  may  mitigate  those  threats  involving  purposes.  For  example,  it  provides  the  basis  for 
automated  auditing,  which  could  discourage  curious  employees.  In  addition  to  clarifying  the  meaning  of 
purpose  restrictions  found  in  laws,  our  formalism  may  aid  understanding  other  vague  requirements.  For 
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example,  HIPAA  requires  healthcare  providers  to  limit  disclosure  of  medical  records  to  the  “minimum 
necessary  to  accomplish  the  intended  purpose”  [U.SlOa].  Our  formalism  for  purpose  may  be  combined 
with  a  formalism  for  minimum  necessary  to  understand  this  requirement. 

In  the  remainder  of  this  chapter,  we  explore  one  possible  use  of  our  formalism  in  the  healthcare  domain. 
We  look  at  how  it  can  help  policy  subjects  understand  the  implications  of  a  policy  in  the  emerging  domain 
of  Regional  Health  Information  Organizations  (RHIOs).  The  American  Reinvestment  and  Recovery  Act 
provides  funding  to  promote  RHIOs.  RHIOs  are  to  collect  and  store  health  records  for  individuals  living  in 
a  defined  region.  Health  care  providers  working  in  those  areas  may  gain  access  to  patient  records  from  their 
local  RHIO.  By  making  records  more  available,  RHIOs  hope  to  improve  patient  care  while  lowering  costs. 

As  RHIOs  are  a  new  technology  and  do  not  directly  provide  treatment,  arguments  may  arise  over  whether 
their  use  is  actually  for  treatment.  A  physician  considering  reading  such  a  record  may  find  the  circumstances 
too  complex  to  understand  without  help.  However,  we  cannot  expect  the  physician  to  perform  the  modeling 
required  to  use  our  auditing  algorithm  either. 

To  shed  light  on  this  issue,  we  show  how  to  apply  our  formalism  to  two  uses  of  RHIOs  and  ask  whether 
they  are  for  the  purpose  of  treatment.  Compliance  officers  at  an  RHIO  may  use  our  algorithm  to  audit 
simulated  logs  of  possible  future  uses  and  determine  which  actions  the  restriction  allows.  The  compliance 
officers  may  generalize  these  quantitative  results  to  a  qualitative  operating  procedure,  such  as  the  physician 
may  read  records  of  patients  with  whom  he  does  not  have  a  current  relationship  only  when  seeing  that 
patient  in  the  future  is  highly  likely. 

Below,  we  show  an  example  of  reasoning  that  could  lead  to  this  procedure.  First,  we  look  at  uploading 
information  to  an  RHIO.  Second,  we  look  at  reading  information  from  an  RHIO.  Our  examination  of  the 
reading  information  considers  multiple  different  models  of  a  RHIO  of  varying  complexity.  After  starting 
with  a  simple  model,  we  consider  extensions  modeling  other  actions  a  physician  may  take,  how  reading  a 
patient’s  record  could  help  other  patients,  multiple  time  steps,  and  learning  information.  We  find  that  in 
some  cases,  a  physician  is  justified  in  reading  records  of  patients  with  whom  the  physician  does  not  have  a 
current  relationship. 

Compliance  officers  at  an  RHIO  may  find  these  results  helpful  while  creating  operating  procedures.  We 
find,  for  example,  under  a  model  of  a  large  hospital,  that  physician  should  not  read  a  patient’s  record  unless 
the  physician  has  a  reason  to  believe  that  the  patient  is  much  more  likely  than  average  to  seek  care  (under 
the  policy  that  physicians  may  read  patient  records  only  for  treatment).  However,  the  policy  is  more  lenient 
for  a  model  of  a  small  hospital. 

4.2  Uploading  Information 

Our  formalism  shows  that  uploading  information  to  RHIOs  is  for  the  purpose  of  treatment  even  when  they 
go  unused.  For  example,  consider  a  physician  Dr.  X  who  uploads  a  patient  Y’s  record  to  an  RHIO,  which 
goes  unused.  Nevertheless,  Dr.  X’s  action  is  for  treatment  since  a  reasonable  model  would  show  with  some 
probability  patient  Y  could  end  up  in  an  accident  resulting  in  an  emergency  room  visit  at  a  hospital  he  has 
never  attended  before.  In  such  a  case,  immediate  access  to  Y’s  record  from  the  RHIO  would  improve  the 
abilities  of  the  emergency  physicians  to  treat  Y.  Thus,  under  our  formalism  of  purpose,  Dr.  X’s  posting  the 
medical  record  is  for  treatment. 

Figure  4.1  shows  such  a  model.  In  it,  the  physician  has  a  choice  between  posting  the  record  or  not. 
Either  way,  with  some  probability  p,  the  patient  will  need  care  from  a  facility  that  has  access  to  the  RHIO. 
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Figure  4.1:  MDP  representing  posting  records  to  an  RHIO.  r i  >  r2- 


In  the  case  where  Dr.  X  posted  the  record,  the  facility  provides  treatment  at  a  high  quality  level  of  ri .  In  the 
case  the  were  Dr.  X  did  not  post  the  record,  the  facility  delivers  treatment  at  a  lower  level  r2  where  r\  >  r 2- 
For  such  a  model,  the  optimal  action  is  to  post  the  record.  Implicit  in  the  model  is  that  posting  a  patient’s 
record  does  not  alters  the  probability  that  the  patient  will  seek  care. 


4.3  Reading  Information 

Now  we  consider  whether  a  physician  may  read  a  record  of  an  RHIO  for  the  purpose  of  treatment  even  when 
that  physician  currently  has  no  plans  to  treat  that  patient.  At  first,  the  answer  appears  to  be  clearly  no  as  a 
physician  who  is  not  engaged  in  treating  a  patient  appears  to  have  no  reason  to  look  at  that  patient’s  record. 
A  simple  model  could  formalize  this  intuition  by  showing  the  action  of  reading  the  record  having  value  for 
treatment  and  not  leading  to  a  state  that  is  any  better  for  treatment. 

However,  such  a  physician  could  use  similar  reasoning  to  that  which  justified  posting  medical  records 
in  Section  4.2  to  argue  that  such  record  reading  is  for  the  purpose  of  treatment.  For  example,  suppose  that 
Dr.  Z  reads  patient  Y’s  record  despite  not  being  involved  in  Y’s  treatment.  Dr.  Z  may  argue  that  he  is 
“proactively”  reading  Y’s  record  because  with  some  non-zero  probability  Y  will  come  to  Dr.  Z  in  the  future 
as  a  patient  and  will  receive  better  treatment  as  a  result  of  Dr.  Z  already  being  familiar  with  Y.  Dr.  Z  might 
formalize  his  argument  with  a  model  like  the  one  shown  in  Figure  4.1,  but  with  reading  the  record  in  place 
of  posting  it.  Such  a  model  shows  more  details  of  the  environment  than  the  our  initial  model.  Unlike  our 
initial  model,  this  more  detailed  model  vindicates  the  physician. 

However,  this  model  does  not  help  guide  the  physician  in  picking  which  record  to  read.  To  do  so,  we 
need  to  replace  the  generic  read  action  with  a  separate  one  for  each  record  that  the  physician  could  read. 
Figure  4.2  shows  such  a  model.  For  simplicity,  we  presume  that  reading  a  record  will  only  affect  Dr.  Z’s 
treatment  of  the  next  patient.  (Later  in  this  section,  we  will  remove  this  restriction  by  modeling  multiple 
sequential  patient  treatments.)  This  model  has  n  patients  in  the  RHIO.  />,  represents  the  probability  that 
Dr.  Z  will  see  patient  i  next.  p0  represents  the  probability  that  Dr.  Z  does  not  have  a  next  patient.  The 
action  read,;  represents  reading  the  /th  patient’s  record.  The  reward  p\  is  a  reward  measuring  how  well  Dr.  Z 
treats  patient  i  without  seeing  that  patient’s  record  beforehand;  5\  represents  the  improvement  in  how  well 
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Figure  4.2:  A  simple  MDP  representing  reading  records  from  an  RHIO.  Each  state  is  labeled  with  the  identifying 
number  of  the  patient  that  needs  treatment. 
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Figure  4.3:  A  more  detailed  MDP  representing  reading  records  from  an  RHIO.  Each  state  is  labeled  with  the  identi¬ 
fying  number  of  the  patient  that  needs  treatment. 


Dr.  Z  may  treat  patient  i  with  seeing  the  record  beforehand.  This  model  is  optimized  by  reading  the  record 
for  patient  i  that  maximizes  the  expected  improvement  p,  *  8\.  Let  i*  represent  the  value  of  i  that  optimizes 
Pi  *  6[.  (For  simplicity,  we  presume  uniqueness.) 

At  this  point,  the  reader  might  be  worried  that  our  formalism  would  allow  anyone  to  read  y’s  record  in 
the  case  where  i*  =  Y  because  with  some  non-zero  odds  this  could  improve  treatment.  However,  with  an 
even  more  detailed  model,  the  auditor  can  show  that  this  is  not  the  case.  In  particular,  the  auditor  could  also 
model  the  ability  of  the  physician  to  further  his  studying  by  studying  medical  advances  instead  of  patient 
records.  Such  a  model  is  shown  in  Figure  4.3.  The  model  does  not  show  all  the  possible  read*  actions. 
Rather  it  shows  only  read,*  where  i*  is  the  patient  that  maximizes  the  expected  improvement  in  treatment  as 
above.  Since  this  is  the  only  read,;  action  that  an  optimizing  agent  would  choose  (as  shown  in  the  model  of 
Figure  4.2),  this  simplification  does  not  affect  planning.  The  model  now  includes  the  action  study.  Studying 
yields  an  improvement  of  in  the  level  of  treatment  the  physician  may  provide  to  patient  i.  The  model 
shows  that  the  physician  should  only  read  the  record  of  patient  i*  when  *si*  >  Er=iMs- 

It  may  be  difficult  to  assign  exact  values  to  the  rewards  p\,  S and  b\.  However,  the  auditor  can  reason 
more  abstractly.  For  example,  if  the  auditor  can  shown  that  p^  is  about  0.01,  then  the  improvement  from 
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reading  the  record  of  i*  must  be  at  least  100  times  that  of  the  expected  improvement  from  studying  (i.e., 
i  For  many  patients  and  physicians,  such  a  large  improvement  from  reading  the  record  ahead  of 
seeing  the  patient  seems  unlikely.  As  p^*  goes  to  1,  the  physician  will  be  justified  in  reading  the  record  even 
if  doing  so  is  no  better  than  studying.  This  effect  matches  our  intuition  that  physicians  should  be  allowed 
to  read  the  records  of  patients  that  they  are  scheduled  to  see  (i.e.,  with  near  1).  Likewise,  as  the  number 
of  possible  patients  decreases,  we  would  expect  p^  to  increase  and  justify  the  physician  reading  the  record. 
This  effect  matches  our  intuition  that  a  physician  inspecting  the  records  of  his  small  family  practice  is  less 
suspicious  than  a  physician  looking  over  records  of  a  large  health  system,  which  is  still  less  suspicious  than 
a  physician  looking  over  records  of  an  entire  RHIO. 

In  summary,  while  the  above  modeling  does  not  provide  a  definitive  answer  without  computing  all  the 
reward  values,  it  does  provide  insight  into  the  problem.  In  particular,  it  shows  that  unless  pi *  is  high  or 
is  large  compared  to  the  value  of  studying,  then  the  physician  will  be  better  off  studying. 


4.4  Interactions  among  Patient  Information 

We  went  from  a  simple  model  that  showed  that  Dr.  Z  could  not  read  any  records,  to  a  more  detailed  one 
showing  that  Dr.  Z  could  always  read  the  records  of  i* ,  to  an  even  more  detailed  model  showing  that  it 
depends  upon  whether  studying  would  be  better  use  of  Dr.  Z' s  time.  This  progression  of  models  shows 
that  unlike  some  properties  (such  as  safety  properties),  an  over-  or  under-approximation  might  not  be  sound. 
Thus,  an  auditor  might  worry  that  the  above  model  is  still  not  detailed  enough  to  produce  the  correct  result. 
While  we  cannot  prove  that  it  is  detailed  enough,  we  can  explore  extensions  of  the  model  and  show  that 
result  remains  fairly  stable  under  them. 

One  might  object  to  the  dichotomy  of  studying  and  reading  records  since,  in  some  cases,  reading  a  record 
for  one  patient  might  aid  in  the  treatment  of  another  patient.  We  can  model  this  secondary  effect  of  reading 
a  record  by  having  a  different  reward  p\-  for  all  i  and  j  from  1  to  n  representing  the  level  of  treatment  that 
the  physician  provides  to  patient  i  after  reading  the  record  j.  (<)\  would  equal  pL  —  p\  under  this  notation.)  In 
this  case,  instead  of  calculating  i*  as  above,  we  would  calculate  it  as  that  value  of  j  that  maximizes  JA  Pip\3- 
Despite  complicating  the  calculation,  the  final  result  still  depends  upon  whether  reading  the  records  is  more 
helpful  for  treatment  than  studying. 

With  the  increasing  use  of  approaches  like  data  mining  for  diagnosis,  we  could  consider  the  reading 
of  every  medical  record  for  the  purpose  of  comparative  study  of  patients.  One  could  model  this  either  as 
a  series  of  reads  and  rewards  that  depends  upon  every  previously  read  record,  or  as  a  single  read  every 
record  action.  However,  a  physician  does  not  need  to  actually  read  every  record  to  use  machine  learning. 
Rather  the  system  could  process  the  records  and  show  the  physician  only  the  results.  While  the  results  might 
contain  sensitive  information,  it  is  likely  to  be  less  sensitive  than  the  actual  records  themselves.  Thus,  the 
physician  could  benefit  from  the  information  in  the  records  without  actually  seeing  them  in  their  entirety 
To  capture  this  in  our  model,  we  would  have  to  decompose  the  action  into  the  smaller  actions  that  could 
make  it  up:  reading  records,  processing  the  records,  reading  results.  As  before,  we  could  find  that  under 
normal  circumstances,  the  physician’s  time  is  better  spent  studying  aggregate  results  than  reading  records 
individually.  This  case  shows  the  effects  of  another  way  to  make  a  model  more  detailed:  decomposing  an 
action  allows  an  auditor  to  pass  more  fine-grained  judgment. 

One  might  want  to  make  the  purpose  of  treatment  parametric  in  each  patient  and  interpret  treatment  as 
meaning  treatment  for  some  patient.  (We  do  not  believe  this  interpretation  to  have  been  the  one  intended  by 
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the  authors  of  HIPAA.  Indeed,  this  is  not  the  interpretation  taken  by  DeYoung  et  al.  in  their  formalization 
of  HIPAA.)  Making  this  change  results  in  looking  at  patient  i’ s  record  as  being  for  the  treatment  of  patient 
i  whenever  >  Sf,  which  seems  rather  likely.  Intuitively,  if  one  is  concerned  with  the  treatment  of  a  single 
patient  instead  of  treatment  in  general,  it  makes  sense  to  read  that  patient’s  medical  record.  (Previously,  we 
implicitly  modeled  the  purpose  of  treatment  as  the  sum  of  the  values  for  treating  each  of  the  patients.) 


4.5  Multiple  Time  Steps 

In  the  above  examples,  for  simplicity,  we  modeled  the  physician  as  having  only  a  single  time  step  during 
which  to  either  read  a  record  or  study.  In  actuality,  physicians  have  multiple  chunks  of  time  that  may  be 
spent  on  some  combination  of  these  two  actions.  We  are  interested  in  finding  out  how  the  presence  of 
multiple  time  steps  affects  which  behaviors  are  allowable  to  the  physician.  In  particular,  we  would  like  to 
know  if  it  is  still  the  case  that  The  model  shows  that  the  physician  should  only  read  the  record  of  patient  i* 
when  *  5\,  >  Y!i=\  Vi  *  Sf 

After  constructing  a  model  of  multiple  time  steps,  we  will  approach  this  problem  by  using  the  imple¬ 
mentation  of  AuditNMDPapprox  algorithm  discussed  in  Section  2.4  to  see  how  accurately  the  above  rule 
predicts  when  a  physician  may  read  a  record.  We  find  that  at  least  for  small  numbers  of  steps,  the  above  rule 
approximately  holds. 

4.5.1  Modeling 

We  assume  that  the  effects  of  reading  a  medical  record  or  studying  wears  off  over  time.  Not  only  does  this 
assumption  model  the  limited  nature  of  human  memory,  but  also  allows  us  to  model  an  infinite  number  of 
time  steps  using  a  finite  model.  In  particular,  we  encode  the  last  h  actions  (reading,  studying,  and  treating) 
of  the  physician  in  the  states  of  the  model.  The  reward  for  treating  a  patient  depends  upon  this  history. 

We  model  this  example  as  a  family  of  NMDPs  that  depend  upon  the  parameter  h,  the  number  of  steps 
before  a  physician  forgets  something.  For  simplicity,  we  assume  that  the  number  of  patients  is  equal  to  h 
as  well.  (Having  more  patients  than  the  physician  can  remember  cannot  change  his  behavior.)  In  the  case, 
where  more  than  h  patients  are  stored  in  an  RHIO,  we  consider  the  subset  of  the  RHIO  that  holds  the  patients 
that  would  benefit  the  most  from  having  their  records  read  (those  that  maximizes  pi  *  S\). 

Formally,  let  m^4  be  the  model  for  the  parameter  h  (and  others  introduced  below).  m{'x4  =  (5,  A,  t,  r,  7) 
where  the  action  space  A  is  equal  to  {stop,  treat,  study,  readi, . . . ,  read^}.  The  state  space  S  is  equal  to 
{treat,  study,  readi,  •  •  • ,  read/j}^  x  C  where  {treat,  study,  readi,  ■  ■  • ,  read/j}^  encodes  a  /i-step  history  of 
the  physician’s  actions  and  C  is  the  set  of  possible  conditions  in  which  the  physician  may  currently  find  him¬ 
self.  The  history  records,  in  order,  which  action  the  physician  made  in  each  of  the  h  most  recent  steps  before 
the  current  step  unless  the  physician  has  taken  the  do-nothing  action  stop.  Once  the  physician  performs  stop, 
the  history  is  frozen  at  its  current  value  and  does  not  record  the  current  or  future  stop  actions.  The  history 
does  not  record  the  stop  actions  since  they  always  result  in  returning  to  the  current  state  making  updating 
the  history  impossible.  However,  this  failure  to  record  the  stop  action  does  not  alter  the  optimal  strategy  of 
the  NMDP  mgx4  since  stop  is  of  zero  reward  and  results  in  a  self-loop  at  every  state.  The  set  of  conditions  C 
is  equal  to  {0,  o,  1, . . .  ,  h}  where  0  represents  no  patients  currently  wanting  to  see  the  physician,  o  (short  for 
other )  represents  a  patient  not  in  the  RHIO  attempting  to  see  the  physician,  and  i  in  {1, . . . ,  h}  represents 
the  ith  patient  of  the  RHIO  attempting  to  see  the  physician. 
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form  of  state 

action 

reward 

((at, 

..ah),c) 

(any  state) 

study 

0 

((at, 

•• ah),c ) 

(any  state) 

read* 

0 

((at, 

■  ■  ah),  0) 

treat 

0 

((at, 

■■ah),  0) 

treat 

Pi  +  n  *  5S0 

((at. 

..,  readj,. 

•  ah),i) 

treat 

p\  +  S[  +  n*  <5f 

((at, 

. .  ah),i)  where  read*  is  not  in  (a±, . 

.  ah)  treat 

Pi  +  n  *  6i 

Table  4.1:  The  rewards  for  m^x4 


In  the  last  three  rows,  n  stands  for  the  number  of  instances  of  study  in  (ai, . . .  ah)- 


At  each  time  step,  the  physician  chooses  whether  to  treat  the  current  patient  (if  any),  read  a  patient’s 
record,  study,  or  do  nothing.  This  updates  his  history  by  replacing  the  oldest  of  the  h  entries  with  this 
choice.  The  condition  also  probabilistically  updates  to  a  value  of  C.  The  transition  function  t  is  such  that 
t(((ai,  a2, ... ,  a^,  c),  a)  is  equal  to  a  distribution  d  over  these  possible  next  states.  In  particular,  d  depends 
upon  additional  parameters  pc  for  each  c  m  C.  pc  provides  the  probability  of  c  being  the  next  condition 
in  which  the  physician  finds  the  hospital.  The  distribution  d  assigns  the  probability  of  pc  to  the  next  state 
((o2, . . . ,  Of,  ,  a),  c)  for  each  c  and  the  probability  of  0  for  all  other  states  where  a  is  the  current  action  of  the 
physician. 

Table  4.1  lists  the  rewards  for  each  state  and  action.  The  reward  for  the  actions  read*  or  study  is  always 
0.  The  reward  for  treat  depends  upon  the  state.  For  the  state  ((or, . . .  a/,),  0),  the  reward  is  also  0.  For  the 
state  ((ai, . . .  «/,.),  o),  the  reward  is  plQ  +  n  *  51  where  is  the  base  reward  for  treating  a  patient  not  in  the 
database,  n  is  the  number  of  instances  of  study  in  (a\, . . . ,  ah),  and  5^  is  the  additional  reward  achieved  per 
studying  action.  For  the  state  ((ai, . . .  ah),i),  if  there  exists  j  in  {1, ...  ,  h }  such  that  aj  =  read,,  the  reward 
will  be  p\  +  8\  +  n*  Sf  where  p\  is  base  reward  for  treating  patient  i,  6)  is  the  additional  reward  for  having 
read  the  patient’s  record,  and  n  and  <5?  are  as  before.  For  the  state  {(a\, . . .  ah),i),  if  there  does  not  exist  j 
in  {1, ...  ,  h}  such  that  a3  =  read,,  the  reward  will  be  p\  +  n  *  <5f .  p\,  S'),  6),  p),  and  5^  are  all  additional 
parameters  to  the  model.  We  also  treat  the  discounting  factor  7  as  a  parameter. 

The  number  of  actions  is  |{stop,  treat,  study,  readi, . . .  ,  read/, } |  =3  +  h.  The  number  of  states  is 

| {treat,  study,  readi,  ■  •  • ,  read/Jh  x  C\  =  (2  +  h)h  *  (2  +  h)  =  (h  +  2)h+1 

For  every  state  s  and  action  a  except  stop,  each  of  the  possible  h  +  2  conditions  in  C  could  arise  in  the 
next  state  from  performing  action  a  in  state  s.  Presuming  all  the  probability  parameters  pc  are  non-zero,  the 
resulting  number  of  non-zero  transitions  is 

|S|  *  |*4  —  {stop})  *  \C\  +  |«S|  =  (h  +  2)h+l  *  (h  +  3  — 1)  *(h  +  2)  +  (h  +  2)h+l  =  (h  +  2)h+3  +  (h  +  2)h+1 

where  the  second  summand  accounts  for  the  self-loop  under  stop  at  each  state. 

Since  m^4  has  64  states  and  1088  non-zero  transitions,  we  cannot  easily  represent  the  whole  model  in 
a  diagram.  Thus,  in  Figure  4.4,  we  show  just  part  of  m^4.  It  shows  only  the  part  of  the  NMDP  relevant  to 
transitions  from  the  states  ((s,  2),  0)  or  ((2,  s),  2).  The  part  of  m)yA  is  sufficient  to  illustrate  the  possibility 
of  multi-state  cycles.  In  particular,  it  shows  the  possibility  of  executions  of  the  following  form 

[((study,  read2),  0),  study,  ((read2,  study),  2),  read2,  ((study,  read2),  0),  . . .] 
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Figure  4.4:  Part  of  the  NMDP  The  figure  only  shows  transitions  and  rewards  originating  at  either  state 

((s,  2),  none)  or  ((2,  s),  2).  It  only  shows  states  involved  in  one  of  these  transitions.  It  abbreviates  the  state  ((oi,  0,2),  c) 
as  aia2:c  where  ai  and  0,2  abbreviations  for  actions:  readi  becomes  1;  read2  becomes  2;  and  treat,  t. 
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Reasoning  about  this  family  of  models  abstractly  as  we  did  for  the  one-step  model  is  much  more  difficult 
since  we  now  have  the  possibility  of  cycles.  Thus,  we  will  instead  reason  about  concrete  models  created 
by  fixing  the  values  of  the  parameters.  To  do  so,  we  will  employ  the  AuditNMDPapprox  algorithm  of 
Section  2.3  to  aid  us  and  illustrate  the  usefulness  of  our  formalism. 

4.5.2  Methodology 

We  conducted  experiments  with  our  implementation  to  gain  a  feel  for  how  the  values  of  the  parameters 
affects  the  allowed  behavior.  For  simplicity,  in  our  experiments,  we  treat  all  patients  in  the  RFIIO  as  identical 
and  use  the  same  rewards  in  the  case  of  a  patient  not  in  the  RHIO  (i.e.,  Pi  =  Pj .  Pi  =  p)  =  P%,  <*i  =  5j,  and 
<5|  =  SSj  =  51  for  all  i  and  j  in  {1, ... ,  h}).  Thus,  we  simply  write  p\  in  the  place  of  pt  for  all  i  in  {1, ... ,  h}. 
We  also  write  5r  in  the  place  of  S-,  pt  for  p\  or  pla,  and  5s  for  5f  or  5^  for  all  i  in  {1, ... ,  h } . 

Our  experiments  study  how  large  the  improvement  5r  must  be  compared  to  the  improvement  <5S  for  the 
obeying  the  policy  means  that  the  physician  must  read  a  patient’s  record  instead  of  studying.  Given  fixed 
values  for  all  other  parameters,  we  call  this  lowest  value  of  5r  for  which  the  physician  may  read  the  record 
of  a  patient  without  violating  the  policy  the  reading  threshold. 

For  the  single  step  model,  we  can  deduce  the  reading  threshold  from  the  rule  that  reading  a  record 
is  acceptable  if  and  only  if  max,; p,;  *  d\  >  Y^i-iPi  *  ^ ■  Under  the  assumption  that  all  patients  in  the 
RFIIO  are  identical  and  allowing  possibility  of  a  patient  not  in  the  RHIO,  this  inequality  becomes  p\  *  5r  > 
Po  *  5s  +  Yli=i  Pi  *  which  provides  the  reading  threshold  of  h*P^~P°  Ss. 

Analytically  determining  the  reading  threshold  for  m^4  is  possible  using  a  manner  similar  to  how  we  did 
so  for  the  single-step  model.  However,  this  analysis  is  complicated  by  the  presence  of  non-trivial  cycles  in 
the  model  and  would  involve  solving  a  system  of  equations.  Thus,  we  instead  estimate  the  reading  threshold 
in  three  manners.  For  the  first  estimation,  we  use  the  reading  threshold  of  the  single  step  model.  In  the 
context  of  our  multi-step  model,  we  call  this  value  the  analytically  estimated  reading  threshold  (AERT) 
since  we  find  it  by  analysis  of  a  simplified  model. 

For  our  second  estimation,  we  use  our  implementation  to  estimate  the  reading  threshold  using  simula¬ 
tions.  We  call  this  estimation  the  simulatively  estimated  reading  threshold  (SERT).  Each  simulation  corre¬ 
sponds  to  setting  the  value  of  5r  to  some  value  v  and  testing  with  the  AuditNMDPapprox  implementation 
whether  reading  is  allowed  at  the  value  v.  In  particular,  we  test  whether  studying  (as  opposed  to  reading  a 
record)  at  the  state  ((treat, . . . ,  treat),  0)  is  a  violation  of  the  policy.  If  so,  then  v  is  an  upper  bound  on  the 
reading  threshold;  if  not,  then  v  is  a  lower  bound.  The  algorithm  establishes  the  initial  lower  bound  as  1  and 
finds  an  initial  upper  bound  by  exponentially  increasing  the  value  of  v  until  AuditNMDPapprox  returns 
true.  After  establishing  initial  lower  and  upper  bounds,  the  estimation  algorithms  iteratively  uses  their  av¬ 
erage  for  the  next  value  of  v  tested  by  AuditNMDPapprox  to  find  either  a  tighter  lower  or  upper  bound. 
The  estimation  algorithm  continues  until  the  bounds  are  within  1%  of  one  another  and  uses  their  average  as 
the  SERT.  If  we  were  using  AuditNMDP,  then  this  procedure  would  guarantee  that  the  resulting  SERT  is 
within  0.5%  of  the  true  reading  threshold.  However,  since  we  use  the  approximate  AuditNMDPapprox, 
the  SERT  may  be  further  from  the  true  reading  threshold. 

For  our  third  estimation,  we  also  use  primarily  simulation.  Thus,  we  denote  it  as  SERT'.  However,  it  is  a 
hybrid  approach  using  the  aert  as  the  initial  value  v  tested  with  our  AuditNMDPapprox  implementation. 
Depending  upon  whether  AuditNMDPapprox  establishes  the  aert  to  be  an  lower  or  upper  bound,  the 
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algorithm  will  search  for  the  missing  bound  using  an  exponential  search.  Then,  the  algorithm  zeros  in  on  an 
estimation  using  the  same  method  as  for  the  SERT.  This  hybrid  approach  is  not  necessarily  quicker  than  the 
method  for  the  SERT  since  the  AERT  could  be  very  inaccurate.  Nor  is  the  SERT7  necessarily  more  accurate 
than  the  AERT  since  its  attempts  to  improve  upon  the  AERT  could  actually  decrease  its  accuracy. 

We  implemented  these  estimation  techniques  in  the  Racket  dialect  of  Scheme  to  use  our  implementation 
of  the  AuditNMDPapprox  algorithm.  They  may  be  downloaded  from: 

http : / /www . cs . emu . edu/ ~mts chant /thesis/ 

We  ran  our  implementations  on  a  Lenovo  U1 10  laptop  computer  with  3GB  of  memory  and  a  1 .60  GHz  Intel 
Core  2  Duo  CPU  running  the  DrRacket  interpreter  in  Windows  Vista. 

4.5.3  Results 

We  compare  the  estimations  AERT,  SERT,  and  SERT7  across  several  models  in  the  family  m^4.  Table  4.2 
summarizes  the  results  for  each  model  we  studied.  The  table  also  reports  the  running  time  required  to 
compute  the  SERT  and  SERT7  for  each  model. 

For  all  experiments,  we  use  5s  equal  to  1.  Most  of  the  experiments  used  h  =  2.  For  each  of  these 
experiments,  we  used  three  different  values  for  the  discounting  factor  7:  0.9,  0.1,  and  0.01.  We  ran  four 
experiment  with  h  =  3  and  7  =  0.01.  In  two  cases,  we  computed  the  SERT7  but  not  the  SERT  due  to  the  long 
running  time  of  these  cases. 

The  table  does  not  report  upon  the  running  time  for  a  single  call  to  AuditNMDPapprox.  For  the 
h  =  2  cases,  the  running  time  for  a  single  call  of  AuditNMDPapprox  varied  between  1.3  and  29  seconds. 
For  the  h  =  3  cases,  the  running  time  varied  between  202  seconds  and  70  minutes.  Each  computation  of 
SERT  took  between  10  and  12  calls  to  AuditNMDPapprox.  Each  computation  of  sert7  took  9  calls. 

Examining  Table  4.2,  we  see  the  following  patterns.  As  expected  from  the  fact  they  are  computed  in 
very  similar  manners  the  SERT  and  the  SERT7  are  always  very  close  to  one  another.  With  the  exception 
of  four  outliers  in  two  different  rows  (in  boldface),  the  AERT,  SERT,  and  SERT7  are  all  very  close.  The 
AERT  is  consistently  less  than  the  other  two  estimations.  This  divergence  increases  as  7  increases,  which 
is  intuitively  because  larger  values  of  7  increase  the  importance  of  steps  after  the  first  step  (the  only  step 
accounted  for  in  the  AERT). 

The  outliers  use  the  very  low  value  of  1  for  pt.  We  compared  the  optimal  strategies  for  these  outliers 
with  similar  models.  Only  in  the  case  of  the  outliers  does  the  physician  read  patient  records  rather  than 
provide  treatment  even  when  a  patient  is  present.  Intuitively,  the  physician  shows  this  behavior  since  the 
reward  pl  for  treating  a  patient  without  having  either  read  the  patient’s  record  or  studying  is  less  than  the 
expect  increase  in  rewards  for  studying  or  reading  a  record.  This  effect  disappears  for  lower  values  of  7 
since  increases  in  future  rewards  become  more  heavily  discounted. 

As  expected  from  known  complexity  results  [Tse90],  increasing  7  increases  the  running  time.  The 
computation  of  SERT7  is  consistently  faster  than  the  computation  of  the  SERT. 

4.5.4  Discussion 

A  base  reward  of  pl  =  1  with  a  studying  bonus  of  5s  =  1  seems  unreasonable  for  most  hospital  settings 
since  it  implies  that  a  physician  may  double  his  ability  to  treat  the  average  patient  in  the  amount  of  time  it 
takes  him  to  treat  a  patient.  (These  values  might  be  reasonable  for  interns  at  a  teaching  a  hospital.)  Putting 


63 


s 

P\ 

Po 

PQ 

Pt 

7 

AERT 

SERT 

time 

SERT' 

time 

2 

0.01 

0.95 

0.03 

1000 

0.01 

97 

97.539 

15  sec 

97.379 

13  sec 

2 

0.01 

0.95 

0.03 

1000 

0.1 

97 

97.539 

20  sec 

97.379 

18  sec 

2 

0.01 

0.95 

0.03 

1000 

0.9 

97 

98.945 

234  sec 

98.895 

211  sec 

2 

0.01 

0.95 

0.03 

10000 

0.01 

97 

97.539 

16  sec 

97.379 

15  sec 

2 

0.01 

0.95 

0.03 

10000 

0.1 

97 

97.539 

22  sec 

97.379 

19  sec 

2 

0.01 

0.95 

0.03 

10000 

0.9 

97 

98.945 

272  sec 

98.895 

246  sec 

2 

0.01 

0.95 

0.03 

100 

0.01 

97 

97.539 

16  sec 

97.379 

13  sec 

2 

0.01 

0.95 

0.03 

100 

0.1 

97 

97.539 

19  sec 

97.379 

16  sec 

2 

0.01 

0.95 

0.03 

100 

0.9 

97 

98.945 

196  sec 

98.895 

177  sec 

2 

0.01 

0.95 

0.03 

10 

0.01 

97 

97.539 

15  sec 

97.379 

13  sec 

2 

0.01 

0.95 

0.03 

10 

0.1 

97 

97.539 

16  sec 

97.379 

15  sec 

2 

0.01 

0.95 

0.03 

10 

0.9 

97 

98.945 

162  sec 

98.895 

146  sec 

2 

0.01 

0.95 

0.03 

1 

0.01 

97 

97.539 

13  sec 

97.379 

12  sec 

2 

0.01 

0.95 

0.03 

1 

0.1 

97 

97.539 

16  sec 

97.379 

15  sec 

2 

0.01 

0.95 

0.03 

1 

0.9 

97 

74.336 

128  sec 

74.076 

1 15  sec 

2 

0.0001 

0.9698 

0.03 

1000 

0.01 

9700 

9753.906 

18  sec 

9737.891 

13  sec 

2 

0.0001 

0.9698 

0.03 

1000 

0.1 

9700 

9753.906 

24  sec 

9737.891 

18  sec 

2 

0.0001 

0.9698 

0.03 

1000 

0.9 

9700 

9894.531 

283  sec 

9889.453 

213  sec 

2 

0.0001 

0.9698 

0.03 

10 

0.01 

9700 

9753.906 

18  sec 

9737.891 

13  sec 

2 

0.0001 

0.9698 

0.03 

10 

0.1 

9700 

9753.906 

20  sec 

9737.891 

15  sec 

2 

0.0001 

0.9698 

0.03 

10 

0.9 

9700 

9894.531 

1 94  sec 

9889.453 

145  sec 

2 

0.0001 

0.9698 

0.03 

1 

0.01 

9700 

9753.906 

16  sec 

9737.891 

12  sec 

2 

0.0001 

0.9698 

0.03 

1 

0.1 

9700 

9753.906 

20  sec 

9737.891 

15  sec 

2 

0.0001 

0.9698 

0.03 

1 

0.9 

9700 

7363.281 

157  sec 

7369.727 

117  sec 

2 

0.0001 

0.95 

0.05 

1000 

0.01 

9502 

9542.969 

18  sec 

9539.117 

13  sec 

2 

0.0001 

0.95 

0.05 

1000 

0.1 

9502 

9542.969 

24  sec 

9539.117 

18  sec 

2 

0.0001 

0.95 

0.05 

1000 

0.9 

9502 

9683.594 

283  sec 

9687.586 

211  sec 

2 

0.01 

0.8 

0.18 

1000 

0.01 

82 

82.07 

15  sec 

82.32 

13  sec 

2 

0.01 

0.8 

0.18 

1000 

0.1 

82 

82.07 

20  sec 

82.32 

18  sec 

2 

0.01 

0.8 

0.18 

1000 

0.9 

82 

84.18 

234  sec 

84.242 

211  sec 

3 

0.01 

0.94 

0.03 

1000 

0.01 

97 

97.539 

45  min 

97.379 

31  min 

3 

0.01 

0.94 

0.03 

1000 

0.1 

97 

97.539 

63  min 

97.379 

54  min 

3 

0.01 

0.94 

0.03 

1000 

0.9 

97 

98.242 

12  hours 

98.137 

461  min 

Table  4.2:  Results  of  experiments  on  TOgx4 .  In  all  cases  6s  =  1.  The  values  for  the  estimations  are  rounded  to  three 
decimal  places.  Four  outliers,  two  each  in  two  rows,  are  in  boldface. 
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aside  the  outliers  for  this  reason,  we  find  that  the  AERT,  SERT,  and  SERT'  are  close.  We  conclude  that  the 
AERT  is  a  good  approximation  of  the  reading  threshold  at  least  for  small  values  of  h. 

The  complexity  of  the  above  calculations  highlights  how  our  model  of  planning  does  not  correspond  to 
how  humans  plan  (further  discussed  in  Chapter  6).  We  cannot  expect  physicians  to  perform  complex  mod¬ 
eling  and  analysis  let  alone  to  use  computer  simulations  before  deciding  whether  to  read  a  record  or  study. 
However,  compliance  officers  at  hospitals  may  find  these  results  helpful  while  drafting  policy  manuals. 

For  example,  consider  a  large  hospital  where  the  probability  of  a  physician  seeing  a  typical  patient  in  the 
RHIO  is  less  than  1  in  10,  000.  At  such  a  hospital,  the  reading  threshold  of  about  9700  holds  across  various 
of  values  for  pt  and  7.  Extrapolating  from  the  results  for  the  tests  using  h  =  3  with  p\  =  0.01  instead  of 
0.0001,  one  may  conclude  that  the  value  is  likely  to  remain  around  9700  for  larger  values  of  h.  In  many 
settings,  managers  may  find  an  improvement  from  reading  a  patient’s  record  of  9700  times  the  improvement 
from  studying  inconceivable.  In  this  case,  a  policy  manual  may  quantitatively  summarize  the  quantitative 
results  shown  in  Table  4.2  as  prohibiting  a  physician  from  reading  patient  records  unless  the  physician  has 
a  reason  to  believe  that  the  patient  is  much  more  likely  than  average  to  be  seeking  care.  Such  a  prohibition 
may  not  make  sense  at  a  small  practice  where  the  probability  of  seeing  an  average  patient  is  1  in  100  since 
reading  a  record  could  conceivably  produce  an  improvement  of  97  times  the  improvement  from  studying. 

4.6  Learning  Additional  Information 

Now  we  consider  the  effects  of  the  physician  learning  additional  information,  which  requires  POMDPs 
to  model.  The  above  example  assumes  that  Dr.  Z  has  a  fixed  probability  distribution  over  next  patients. 
However,  Dr.  Z  might  learn  information  that  leads  him  to  consider  some  patients  more  likely  than  others. 
In  the  extreme  case,  a  patient  might  present  himself  at  Dr.  Z’s  practice  leading  him  to  assign  the  probability 
of  1  to  that  patient.  In  this  case,  the  model  would  change  and  reading  that  patient’s  record  would  become 
the  optimal  plan. 

In  a  more  complex  example,  Dr.  Z  might  learn  that  someone  with  the  named  “John  Smith”  has  been  in 
an  accident  and  will  require  treatment  soon.  As  “John  Smith”  is  a  common  name,  Dr.  Z  might  have  more 
than  one  record  bearing  the  name.  If  the  number  is  small  enough  (less  than  a  100  in  the  above  model),  then 
the  best  use  of  Dr.  Z’s  time  will  be  reading  as  many  of  these  records  as  possible.  Even  in  the  case  where 
Dr.  Z  has  time  to  read  only  a  single  record  making  it  unlikely  he  will  read  the  record  of  the  John  Smith 
coming  for  treat,  Dr.  Z’s  reading  will  be  for  the  purpose  of  treatment. 

In  this  case,  Dr.  Z  should  select  one  of  the  records  bearing  the  name  “John  Smith”  that  has  the  highest 
expected  improvement  in  treatment.  In  the  case  where  more  than  one  record  has  the  same  expected  im¬ 
provement,  Dr.  Z  might  be  tempted  choose  amongst  these  records  with  another  illicit  purpose  in  mind.  To 
avoid  Dr.  Z  satisfying  this  other  illicit  purpose,  his  selection  should  be  uniformly  random  amongst  all  these 
records.  A  medical  record  system  could  perform  the  selection  for  him  to  insure  randomness. 

In  more  complex  situations,  in  coming  information  might  also  adjust  the  probabilities  without  ruling  any 
one  patient  out.  For  example,  learning  that  a  patient  has  a  chronic  condition  and  is  likely  to  seek  treatment 
could  make  that  patient  more  likely  relative  to  the  others  without  ruling  out  any  of  the  other  patients. 

Each  of  these  the  above  scenarios  can  be  modeled  using  POMDPs.  Using  our  formalism  for  information 
use,  we  can  examine  whether  the  incoming  information  is  used  for  the  purpose  of  treatment.  In  particular, 
if  the  incoming  information  causes  the  physician  to  read  a  patient  medical  record  rather  than  study,  our 
formalize  will  show  that  the  information  was  used  for  the  purpose  of  treatment.  In  the  case  where  the 
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information  resulted  in  no  change  in  the  physician’s  behavior,  our  formalism  will  show  neither  that  it  was 
used  for  treatment  nor  that  it  was  not  used  for  treatment  as  the  physician  may  have  decided  to  study  either 
with  or  without  the  additional  information. 


4.7  Revisiting  Uploading 

The  above  arguments  about  a  physician  having  to  make  good  use  of  his  time  does  not  invalidate  the  reasoning 
used  to  justify  Dr.  X  posting  Y’s  record  to  the  RHIO.  In  particular,  the  probability  that  some  doctor  in  the 
region  would  need  Y’s  record  at  some  point  in  the  future  is  fairly  high.  However,  the  probability  from 
above  is  comparatively  small  as  it  is  limited  to  just  Dr.  Z  and  to  only  the  period  of  time  over  which  Dr.  Z 
can  remember  the  information.  Furthermore,  presuming  suitable  automation  Dr.  X  posting  the  record  will 
take  so  little  time  as  that  little  else  productive  could  be  done  in  that  time.  Dr.  Z  reading  a  record,  on  the 
other  hand,  takes  much  longer. 

Now,  the  reader  may  be  worried  that  our  formalism  allows  Dr.  X  to  post  the  record  anywhere  because 
with  some  odds  it  might  be  retrieved  from  that  location  to  improve  treatment.  While  our  formalism  allows 
such  posting,  the  above  example  involving  Dr.  Z  shows  that  our  formalism  does  not  allow  people  with 
access  to  these  postings  to  read  them  unless  there  is  a  good  reason.  If  we  trust  those  with  access  not  to 
abuse  their  access  (which  is  implicit  in  HIPAA),  then  distributing  the  record  does  not  have  negative  privacy 
implications.  However,  as  we  are  not  that  trusting,  one  must  weigh  the  risks  of  abuse  with  the  possible 
benefits  of  access.  While  a  formalism  based  on  MDPs  and  planning  may  be  helpful  for  such  balancing,  it  is 
outside  of  the  scope  of  this  work  formalizing  purpose. 
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Chapter  5 


Empirical  Study  of  Semantics 


5.1  Goals 

Both  previous  work  and  this  work  offer  methods  for  enforcing  privacy  policies  that  feature  purpose  restric¬ 
tions.  These  methods  test  whether  a  sequence  of  actions  violates  a  clause  of  a  privacy  policy  that  restricts 
certain  actions  to  be  only  for  certain  purposes.  By  providing  a  test  for  whether  the  purpose  restriction  is 
violated,  these  methods  implicitly  provide  a  semantics  for  these  restrictions. 

To  ensure  that  these  methods  correctly  enforce  the  privacy  policy,  one  must  show  that  the  semantics 
employed  by  a  method  matches  the  intended  meaning  of  the  policy.  Unfortunately,  determining  the  intended 
meaning  of  a  policy  from  its  text  is  often  impossible.  Furthermore,  these  policies  often  act  as  agreements 
among  multiple  parties  who  may  differ  in  their  interpretation  of  the  policy.  For  these  reasons,  we  compare 
the  semantics  proposed  by  these  methods  of  policy  enforcement  to  the  most  common  interpretations  of 
policies. 

While  previous  works  have  not  provided  a  formal  semantics,  it  appears  that  many  works  (e.g.,  [AF07, 
JSNS09])  flag  actions  as  a  violation  if  they  do  not  further  the  purpose  in  question.  (See  Section  7.1  for 
a  description  of  past  works.)  In  particular,  these  works  make  assumptions  about  how  people  think  about 
purpose  in  the  context  of  enforcing  a  privacy  policy  that  restricts  an  agent  to  only  performing  a  certain  class 
of  actions  for  a  certain  purpose.  The  following  hypothesis  characterizes  these  assumptions: 

HI.  The  agent  obeys  the  restriction  if  and  only  if  the  action  furthered  the  purpose. 

This  hypothesis  entails  the  following  hypothesis  about  how  people  interpret  the  meaning  of  purpose'. 

HI’.  An  action  is  for  a  purpose  if  and  only  if  that  action  furthers  that  purpose. 

Our  work  instead  asserts  that  an  action  may  be  for  a  purpose  even  if  that  purpose  is  never  furthered.  In 
particular,  we  assert  that  the  action  merely  has  to  be  part  of  a  plan  for  furthering  that  purpose.  Thus,  our 
formalism  assumes  the  following  hypothesis  (in  the  same  context  as  above): 

H2.  The  auditee  obeys  the  restriction  if  and  only  if  the  auditee  performed  that  action  as  part  of  a  plan  for 
furthering  that  purpose. 

(We  do  not  construct  our  algorithms  directly  from  Hypothesis  H2.  Rather  they  are  approximations  using 
only  observable  information.)  Similarly,  this  hypothesis  entails  the  following: 
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H2’.  An  action  is  for  a  purpose  if  and  only  if  the  auditee  performed  that  action  as  part  of  a  plan  for  furthering 
that  purpose. 

To  show  that  our  work  provides  a  method  of  enforcing  purpose  restrictions  more  faithful  to  their  common 
meaning,  we  would  like  to  disprove  Hypotheses  HI  and  HI’  while  proving  Hypotheses  H2  and  H2’. 

As  Hypothesis  HI  is  a  bi-implication,  we  can  disprove  it  by  disproving  either  of  the  following  hypotheses 
(here  and  henceforth,  in  the  same  context  as  above): 

Hla.  If  the  action  furthers  a  purpose,  then  the  auditee  obeys  the  restriction. 

Hlb.  If  the  auditee  obeys  the  restriction,  then  the  action  furthers  a  purpose. 

We  will  attempt  to  disprove  both  Hypotheses  Hla  and  Hlb. 

Similarly,  Hypothesis  H2  breaks  into  two  sub-hypotheses: 

H2a.  If  the  auditee  performed  an  action  as  part  of  a  plan  for  furthering  a  purpose,  then  the  auditee  obeyed 
the  restriction. 

H2b.  If  the  auditee  obeyed  the  restriction,  then  the  auditee  performed  the  action  as  part  of  a  plan  for  fur¬ 
thering  that  purpose. 

We  will  test  both  of  these  hypotheses  by  providing  example  scenarios  of  an  auditee  performing  actions  with 
descriptions  of  his  plans.  However,  these  tests  will  not  prove  either  of  these  hypotheses  as  doing  so  would 
require  testing  them  under  all  scenarios.  Indeed,  given  that  some  tests  could  be  carefully  crafted  to  bring 
about  success  for  reasons  unrelated  to  planning,  such  testing  does  not  necessarily  provide  good  evidence  in 
favor  of  these  hypotheses.  To  provide  better  evidence  for  the  truth  of  Hypothesis  H2,  we  will  also  test  the 
following  related  hypothesis: 

H2c.  Describing  an  action  as  being  part  of  a  plan  for  furthering  purpose  as  opposed  to  not  being  part  of 
such  a  plan  in  a  scenario  causes  people  to  think  that  the  auditee  obeyed  the  restriction. 

H2c  may  be  viewed  a  causal  or  directional  version  of  H2.  Unlike  H2a  and  H2b,  which  may  be  tested  with 
unrelated  scenarios,  H2c  must  be  tested  with  scenarios  that  only  differ  from  one  another  in  whether  the 
action  is  part  of  a  plan  for  the  purpose  in  question. 

For  completeness  we  also  test  the  causal  version  of  HI : 

Hlc.  Describing  an  action  as  furthering  a  purpose  as  opposed  to  not  furthering  a  purpose  in  a  scenario 
causes  people  to  think  that  the  auditee  obeyed  the  restriction. 

As  Hypothesis  HI  leads  to  Hypotheses  Hla,  Hlb,  and  Hlc,  Hypothesis  HI’  leads  to  corresponding 
hypotheses  Hla’,  Hlb’,  and  Hlc’.  Similarly,  H2’  leads  to  H2a’,  H2b’,  H2c’.  We  also  test  these  hypotheses 
to  provide  additional  evidence  for  our  formalism. 

5.2  Methodology 

Approach.  We  may  disprove  Hypothesis  Hla  by  exhibiting  a  scenario  in  which  an  action  of  an  auditee 
furthers  a  purpose,  but  people  feel  that  the  auditee  did  not  obey  a  purpose  restriction  stating  that  the  action 
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Furthered  purpose  Did  not  further  purpose 

Planned  for  purpose 

Not  planned  for  purpose 

Cpf  cpF 

Cpf _ C=f 

Table  5.1:  Classes  of  Scenarios  for  Survey  Questionnaire.  Each  position  in  the  grid  identifies  the  scenario  class 
associated  with  the  values  of  the  two  factors  given  on  each  axis. 


may  only  be  performed  for  that  purpose.  We  may  disprove  Hypothesis  Hlb  by  exhibiting  an  scenario  in 
which  an  action  does  not  further  a  purpose,  but  people  feel  that  the  auditee  obeyed  the  restriction.  To  test 
Hypothesis  Hlc,  we  construct  a  pair  of  scenarios  that  differs  only  in  whether  the  action  furthered  the  purpose 
in  question,  and  show  that  people’s  feelings  about  whether  the  auditee  obeyed  the  restriction  is  unchanged 
across  the  two  scenarios. 

Testing  Hypotheses  H2a,  H2b,  and  H2c  is  similar  to  testing  the  corresponding  hypothesis  for  H 1 .  How¬ 
ever,  we  expect  the  opposite  results.  For  example,  to  test  Hypothesis  H2c,  we  construct  a  pair  of  scenarios 
that  differs  only  in  whether  the  auditee  performed  that  action  as  part  of  a  plan  for  furthering  that  purpose. 
We  expect  to  show  that  people  feel  that  the  auditee  obeyed  the  restriction  only  in  the  scenario  in  which  the 
action  is  part  of  a  plan  for  furthering  that  purpose. 

To  these  ends,  we  use  four  classes  of  scenarios:  Classes  Cpf,  Cpf,  Cpf,  and  Cpf.  Each  class  is  determined 
by  two  factors:  (1)  whether  the  action  furthers  the  purpose  in  question  in  the  scenario  and  (2)  whether  the 
auditee  performs  the  action  as  part  of  a  plan  for  furthering  the  purpose.  Table  5.1  identifies  these  classes 
along  these  two  axes.  (E.g.,  Cpf  stands  for  the  scenario  that  was  not  planned  (p)  for  the  purpose  but  furthered 
(0  it-) 

Showing  that  people  think  the  auditee  does  not  obey  the  restriction  in  Scenario  Class  Cpf  is  sufficient  for 
disproving  Hypothesis  HI  by  disproving  Hypothesis  HI  a.  Showing  that  people  think  the  auditee  obeys  the 
restriction  in  Class  Cpf  provides  additional  evidence  that  previous  approaches  are  insufficient  by  disproving 
the  other  direction,  Hlb,  of  the  bi-implicational  Hypothesis  HI.  Comparing  Class  Cpf  against  Cpf  tests 
Hypothesis  Hlc.  Comparing  Class  Cpf  against  Cpf  also  tests  Hypothesis  Hlc. 

For  Hypothesis  H2,  showing  that  people  think  the  auditee  obeyed  the  restriction  in  Classes  Cpf  and  Cpf 
each  provides  evidence  for  Hypothesis  H2a.  Showing  that  people  think  the  auditee  does  not  obey  the  re¬ 
striction  in  Classes  Cpf  and  Cpf  each  provides  evidence  for  Hypothesis  H2b  by  way  of  the  contrapositive. 
Comparing  Class  Cpf  against  Cpf  and  comparing  Class  Cpf  against  Cpf  test  Hypothesis  H2c. 

Questionnaire  Construction.  We  constructed  a  questionnaire  with  four  scenarios,  one  from  each  of  the 
four  scenario  classes  above.  The  auditee  in  these  four  scenarios  is  subject  to  a  privacy  policy  that  states  that 
the  auditee  may  only  use  a  type  of  information  for  a  single  purpose.  The  policy  we  used  for  the  questionnaire 
is  as  follows: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 

Table  5.2  presents  the  scenarios  where  Scenario  Sxy  is  the  scenario  in  Scenario  Class  Cxy. 

For  each  scenario,  we  ask  the  participant  five  questions.  The  first  two  are  simple  questions,  Questions  Q1 
and  Q2,  about  each  scenario.  These  questions  have  objectively  correct  answers  that  the  participant  can  easily 
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Spf.  A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker  devel¬ 
ops  a  plan  with  the  sole  goal  of  treating  the  patient.  The  plan  includes  sharing  the  patient’s  medical 
record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  succeeds  in  treating  the 
patient. 

Spp  A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker  devel¬ 
ops  a  plan  with  the  sole  goal  of  treating  the  patient.  The  plan  includes  sharing  the  patient’s  medical 
record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  did  not  succeed  in  treating 
the  patient. 

Spf.  A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker  de¬ 
velops  a  plan  with  the  sole  goal  of  reducing  costs  for  the  hospital.  The  plan  includes  sharing  the 
patient’s  medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  succeeds 
in  treating  the  patient. 

Spf.  A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker  devel¬ 
ops  a  plan  with  the  sole  goal  of  reducing  costs  for  the  hospital.  The  plan  includes  sharing  the  patient’s 
medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  did  not  succeed  in 
treating  the  patient. 


Table  5.2:  Questionnaire  Scenarios.  For  each  scenario  class,  the  scenario  used  on  the  questionnaire. 


find  by  reading  the  scenarios.  Checking  that  the  participant  chooses  the  correct  answer  allowed  us  to  ensure 
that  the  participants  actually  read  the  scenario  and  answered  accordingly  rather  than  arbitrarily. 

After  priming  the  participant  to  have  read  the  scenario  with  Questions  Q1  and  Q2,  we  ask  the  key 
questions  for  our  study.  While  we  are  most  interested  in  whether  the  participant  believes  that  the  auditee 
obeyed  the  policy,  we  start  with  the  more  basic  Question  Q3:  whether  the  action  was  for  the  allowed  purpose 
of  treatment.  We  then  ask  Question  Q4:  whether  the  auditee  obeyed  the  policy.  For  each  of  these  four 
questions,  the  participant  may  select  among  yes,  no,  and  I  don’t  know.  We  expected  Questions  Q3  and  Q4 
to  be  answered  identically  for  each  scenario.  We  included  both  to  help  determine  whether  the  questionnaire 
was  well  worded  and  to  test  the  Hypotheses  HU  and  H2’.  We  conjecture  that  the  majority  of  participants 
will  answer  this  Questions  Q3  and  Q4  with  yes  for  the  scenarios  in  classes  Cpf  and  Cpf ,  and  with  no  for  Cpf 
and  Cpf. 

To  help  determine  the  reasoning  used  by  the  participants,  which  would  be  especially  useful  if  our  survey 
results  deviated  from  the  expected,  we  included  the  free  form  Question  Q5  asking  why  the  participant  chose 
the  answer  he  did  to  Question  Q4.  Table  5.3  shows  the  wording  of  these  questions. 

We  presented  these  scenarios  and  questions  as  a  questionnaire.  For  each  survey  participant,  we  randomly 
ordered  the  scenarios  to  reduce  the  effects  that  scenario  ordering  may  have.  For  each  scenario,  we  ordered 
the  questions  with  the  objective  questions,  Q1  and  Q2,  first  to  encourage  the  participant  to  read  the  scenario 
before  answering  the  subjective  questions  in  which  we  are  interested.  The  subjective  questions  are  ordered 
as  follows:  Q3,  Q4,  and,  lastly,  Q5.  Appendix  B.l  shows  a  sample  questionnaire. 
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Ql.  Was  the  goal  of  the  case  worker’s  plan  to  treat  the  patient? 

Q2.  Did  the  specialist  succeed  in  treating  the  patient? 

Q3.  Did  the  case  worker  share  the  record  with  the  specialist  for  the  purpose  of  treatment? 
Q4.  Did  the  case  worker  obey  the  above  privacy  policy? 

Q5.  Why  did  you  answer  Question  4  as  you  did? 

Table  5.3:  Questionnaire  Questions 


Pilot  Study.  Before  running  the  main  survey  we  conducted  a  small  scale  pilot  study  of  ten  participants. 
The  participants  were  recruited  on  Amazon  Mechanical  Turk  (www  .  mturk  .  com)  using  a  small  payment 
of  $1.50  (USD).  (Appendix  B.2  shows  the  advertisement.)  Participants  took  the  survey  online  using  Me¬ 
chanical  Turk’s  survey  functionality  without  randomly  ordering  the  scenarios. 

The  goal  of  this  pilot  study  was  to  ensure  that  our  recruitment  and  survey  mechanisms  worked.  We 
also  closely  examined  the  responses  to  determine  whether  the  participants  were  seriously  answering  the 
questions,  and  whether  Questions  Ql  and  Q2  identified  arbitrary  responses.  As  the  goal  of  this  study  was 
not  to  collect  data  on  our  hypotheses,  we  did  not  statistically  analyze  the  data.  However,  we  will  qualitatively 
describe  the  results  below. 

In  the  pilot  study,  seven  of  the  ten  respondents  matched  our  predictions  perfectly.  One  respondent 
deviated  for  a  single  answer  in  a  manner  inconsistent  with  the  other  answers  provided  by  the  respondent. 
Thus,  we  suspect  that  his  response  is  most  likely  an  error  in  selecting  the  answer. 

A  second  respondent  said  that  the  action  was  not  for  the  purpose  of  treatment  in  Scenarios  Spf  and  S^, 
but  that,  nevertheless,  the  case  worker  obeyed  the  policy  since  the  specialist  would  try  to  provide  treatment. 
This  response  suggests  that  Hypotheses  H2  and  H2’  are  more  than  trivially  different. 

The  third  respondent  to  deviate  from  our  hypothesis  claimed  that  the  action  was  for  the  purpose  of 
treatment  and  the  case  worker  obeyed  the  policy  in  all  of  the  scenarios  including  Scenarios  Spf  and  S^  where 
goal  of  the  case  worker  was  cost  reduction.  This  respondent’s  answer  to  Question  Q5  suggests  that  the  case 
worker  did  not  violate  the  policy  as  the  scenarios  provide  evidence  that  the  specialist  provided  treatment 
whereas  they  provide  no  evidence  that  any  of  the  actions  reduced  costs.  For  example,  this  respondent 
provided  the  following  for  Question  Q5  given  Scenario  Spf: 

Though  the  case  worker’s  goal  was  cost-reduction,  the  medical  records  were  still  provided  for 
the  purpose  of  treating  the  patient;  simply  giving  medical  records  to  outside  specialists,  with  no 
further  actions,  would  not  be  a  way  to  reduce  costs  for  a  hospital. 

This  response  highlights  that  our  scenarios  discuss  treatment  in  more  detail  than  cost  reduction,  which  could 
have  unintended  effects  on  people’s  analysis  of  them. 

Interestingly,  while  these  two  deviations  do  not  match  our  Hypothesis  H2,  they  are  consistent  with 
the  approximations  our  algorithm  makes.  While  these  deviations  suggest  interesting  directions  for  future 
studies,  we  decided  that  these  issues  did  not  warrant  rewriting  the  scenarios  to  include  more  information  on 
cost  reduction  or  to  examine  more  carefully  the  differences  between  Hypotheses  H2  and  H2’. 
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None  of  the  respondents  said  that  the  policy  was  violated  in  Scenario  S  p  providing  evidence  against 
Hypothesis  HI.  None  of  the  respondents  answered  Questions  Q1  or  Q2  incorrectly  and  none  of  their  re¬ 
sponses  appeared  arbitrary. 

Survey  Protocol.  The  main  survey  consisted  of  two  hundred  participants.  We  conducted  the  survey  in 
the  same  manner  as  the  pilot  study  but  with  three  changes.  First,  given  the  ease  with  which  we  recruited 
participants  for  the  pilot  study,  we  reduced  the  payment  to  $0.50. 

Second,  while  still  using  Mechanical  Turk  to  recruit  and  pay  participants,  we  used  Survey  Gizmo  (www  . 
surveygizmo  .  com)  to  conduct  the  survey.  This  change  allowed  us  to  randomly  order  the  scenarios  for 
each  participant. 

Third,  given  the  success  of  Questions  Q1  and  Q2,  we  decided  before  the  survey  to  exclude  from  the 
results  any  participants  who  got  more  than  one  of  them  wrong  in  total  across  all  four  scenarios.  The  odds  of 
correctly  guessing  either  all  the  answers  or  all  but  one  is  less  than  4%  presuming  the  participant  knows  that 
I  don ’t  know  is  never  a  correct  answer. 1 

We  analyzed  the  survey  responses  according  to  the  statistical  model  presented  in  the  next  section. 


5.3  Statistical  Modeling 

In  this  section,  we  provide  a  detailed  description  of  the  statistical  tests  we  employ  in  the  next  section.  Those 
with  a  background  in  hypothesis  testing  and  statistics  may  find  the  following  summary  sufficient. 


Summary.  Each  of  the  hypotheses  HI  a,  Hlb,  H2a,  and  H2b  makes  predictions  about  whether  Question  Q4 
will  be  answered  with  yes  or  no.  We  model  these  answers  as  a  draw  from  a  binomial  distribution  and  we 
interpret  these  predictions  as  predictions  about  probability  of  success  for  the  binomial  distribution.  For 
Hypotheses  HI  a  and  Hlb,  we  treat  their  predictions  as  the  null  hypotheses  about  the  probability  of  success 
and  attempt  to  reject  them  to  disprove  HI.  We  treat  the  predictions  of  H2a  and  H2b  as  the  alternative 
hypotheses  and  attempt  to  reject  their  negations  as  null  hypotheses  to  provide  evidence  in  favor  of  H2. 
Table  5.5  presents  how  to  convert  these  predictions  in  testable  hypotheses.  In  short,  we  interpret  a  prediction 
that  a  question  will  be  answered  with  a  certain  response  as  an  assertion  that  the  probability  of  success  (seeing 
that  response)  is  at  least  0.5. 

To  test  Hypothesis  Hie,  we  use  McNemar’s  Test  to  test  whether  an  action  furthering  a  purpose  has  a 
statistically  significant  effect  on  how  people  answer  Question  Q4.  We  test  Hypothesis  H2c  using  McNemar’s 
Test  across  scenarios  that  only  differ  in  the  goal  of  the  auditee’s  plan. 

We  test  Hypotheses  HI’  and  H2’  analogously  using  Question  Q3  in  the  place  of  Q4.  For  all  statistical 
tests,  we  use  a  =  0.05  for  the  threshold  of  statistical  significance.2 

'The  odds  of  guessing  correctly  one  of  the  questions  is  -  since  there  are  two  possible  answers  (ruling  out  I  don't  know). 
Each  of  the  four  scenarios  have  two  questions  meaning  that  seven  or  eight  would  have  to  be  correctly  guessed  for  a  guessing 
participant  to  avoid  rejection.  We  model  these  guesses  using  the  binomial  distribution,  which  has  the  cumulative  distribution 
function  F(x;  n,p)  =  Pr[X  <  a:]  =  Yli=o  (™)pl(l  —  p)n_*  where  x  is  the  number  of  successes,  n  the  number  of  trails,  and  p 
is  the  probablity  of  success.  In  particular,  we  find  that  odds  of  getting  7  or  more  success  is  1  —  F( 6;  8,  4)  where  F(6;  8,  |)  is  the 
odds  of  getting  6  or  fewer  successes.  This  is  1  —  F( 6;  8,  |)  m  1  —  (®) \\l  —  |)8_I  <  1  —  0.96  =  0.04.  Without  ruling 

out  the  option  of  /  don’t  know ,  the  odds  of  successfully  avoiding  detection  would  bel  —  F(6;  8,  |)  <0.01. 

2The  statistical  tests  used  in  this  chapter  are  all  from  the  “orthodox”  frequentist  interpretation  of  statistics.  Bayesian  statistics 
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5.3.1  Hypothesis  Testing 


An  underlying  presumption  of  this  work  is  that  purpose  has  an  objective  definition  on  which  people  generally 
agree.  However,  even  under  this  presumption,  we  cannot  expect  that,  for  each  question,  every  participant 
will  respond  with  the  same  answer.  Some  participants  might  misread  the  question  or  hold  non-standard 
views.  Thus,  we  model  each  response  to  Question  Q4  as  a  trial  of  a  distribution  over  the  three  possible 
responses:  yes,  no,  and  I  don’t  know. 

The  hypotheses  HI  a,  Hlb,  H2a,  and  H2b  each  make  predictions  about  how  people  will  answer  Ques¬ 
tion  Q4  in  various  scenarios.  For  example,  Hypothesis  Hla  predicts  that  people  will  answer  Question  Q4 
with  yes  rather  than  no  when  given  a  scenario  of  Class  Cpf.  Literally  interpreted,  Hypothesis  Hla  predicts 
that  the  probability  of  answering  yes  under  Scenario  Spf,  which  we  denote  as  pp fy,  will  be  1.  However,  as 
discussed  above,  we  would  expect  to  see  the  probability  pp fy  being  somewhat  less  than  1  even  if  Hypothe¬ 
sis  Hla  is  true.  The  lower  the  probability,  the  more  questionable  the  truth  of  the  hypothesis  becomes.  The 
lower  limit  at  which  we  reject  the  hypothesis  as  false  depends  upon  how  one  formalizes  the  hypothesis.  We 
choose  to  set  this  limit  at  the  probability  0.5  since  a  hypothesis  that  does  not  correctly  predict  the  majority 
of  outcomes  appears  clearly  false  to  us.  Thus,  we  formalize  this  prediction  as: 

HlUoy*  Ppfy  —  O't) 

As  we  hope  to  disprove  Hypothesis  Hla,  we  would  like  to  cast  doubt  on  the  Hypotheses  Hlaoy,  which 
makes  it  a  null  hypothesis  we  hope  to  reject.  Rejecting  the  null  hypothesis  provides  evidence  in  favor  of  the 
alternative  hypothesis  we  hope  to  show: 

Hlaay.  Ppfy  A  0.5 

Since  Hlaoy  predicts  a  large  number  of  yes  responses  and  Hlaay  predicts  a  small  number,  the  smaller  the 
number  of  yes  responses  observed  among  the  survey  responses,  the  more  likely  Hlaay  seems  relative  to 
Hlaoy.  That  is,  seeing  a  small  number  y  or  fewer  yes  responses  is  more  unlikely  under  the  assumption  of 
Hlaoy  than  under  the  assumption  Hlaay.  As  this  small  number  y  decreases  the  probability  of  seeing  y  or 
fewer  yes  responses  under  the  assumption  of  Hlaoy  decreases.  This  probability  is  called  the  p-value.  It 
is  convenient  to  represent  the  p-value  as  Pr [Y <y  |  Hlaoy]  where  Y  is  a  random  variable  over  the  number 
of  observed  yes  responses  and  y  is  the  actual  number  of  observed  yes  responses.  However,  the  hypoth¬ 
esis  Hlaoy  is  a  composite  hypothesis  asserting  that  ppfy  =  p  for  some  p  >  0.5.  Since  we  would  like 
to  disprove  the  null  hypothesis  for  all  these  possible  values  of  p,  we  use  the  upper  bound  as  the  p-value: 
maxp:o.5<p<i  Pr[y  <y  \  Ppfy  =  p\. 

If  the  number  y  of  observed  yes  responses  is  small  enough,  then  the  p-value  may  become  so  small  that 
we  may  confidently  reject  the  null  hypothesis  Hlaoy  in  favor  of  the  alternative  Hlaay.  Since  we  are  looking 
for  a  low  value  for  the  number  of  yes  responses  to  reject  the  null  hypothesis,  we  are  using  a  lower-tail 
rejection  region. 

We  must  decide  how  unlikely  the  observation  must  be  before  we  are  willing  to  reject  the  null  hypothesis. 
This  choice  must  balance  the  risk  of  incorrectly  rejecting  a  null  hypothesis  that  is  actually  true  (called  Type  1 

offers  many  advantages  over  frequentist  statistics  (see,  e.g.,  [Jay03]).  However,  the  analysis  in  this  chapter  is  the  current  standard  for 
the  area.  Furthermore,  given  the  overwhelming  evidence  in  favor  Hypothesis  H2  and  against  HI,  the  flaws  of  frequentist  methods 
should  not  affect  any  of  the  outcomes.  Lastly,  the  author  desires  to  perform  the  statistical  analysis  he  selected  before  seeing  the 
data  to  avoid  the  impression  of  changing  analyses  to  reach  a  desired  outcome.  For  these  reasons,  the  author  will  leave  a  Bayesian 
analysis  of  the  data  as  future  work  despite  believing  that  one  would  be  more  mathematically  justified  and  accurate. 
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Error )  with  the  risk  of  incorrectly  accepting  a  null  hypothesis  that  is  false  (called  Type  II  Error).  Following 
convention,  we  choose  the  level  of  Type  I  Error  to  be  a  =  0.05.  That  is,  we  reject  Hlaoy  in  favor  of  Hlaay  if 
the  p- value  (the  probability  of  seeing  observed  number  of  yes  responses  or  fewer  under  the  assumption  that 
Hlaoy  is  true)  is  less  than  0.05. 

Hypothesis  Hla  also  produces  another  prediction:  that  the  number  of  no  responses  to  Question  Q4  will 
be  low  for  Scenario  Spf.  We  can  also  formalize  this  prediction  as  a  null  hypothesis  that  we  hope  to  reject: 

Hlaon-  Ppfn  ^  0-5 

The  alternative  hypothesis  we  hope  to  accept  in  favor  of  the  null  hypothesis  is 
Hlaan.  ppfn  ^  0.5 

In  this  case,  we  become  more  willing  to  reject  the  null  hypothesis  Hlaon  as  the  number  of  no  responses 
increases.  This  creates  an  upper-tail  rejection  region  in  which  we  are  interested  in  the  probability  of  seeing 
the  observed  number  of  no  responses  or  more  under  the  assumption  that  the  null  hypothesis  is  true.  As 
before  this  quantity  is  called  the  p-value.  We  will  again  reject  if  the  p-value  is  less  than  a  =  0.05. 

We  can  also  formalize  the  predictions  made  by  Hypothesis  H2a.  However,  as  we  hope  to  provide 
evidence  in  favor  of  Hypothesis  H2  instead  of  disproving  it,  we  treat  its  predictions  as  alternative  hypotheses 
rather  than  null  hypotheses.  For  the  null  hypotheses  we  use  the  negations  of  its  predictions  and  attempt  to 
disprove  them.  For  example,  Hypothesis  H2a  predicts  that  the  number  of  yes  responses  to  Question  Q4  for 
Scenario  Spj  will  be  high.  Thus,  we  attempt  to  provide  evidence  for  the  following  alternative  hypothesis: 

H2aay.  ppfy  >  0.5 

We  do  so  by  showing  the  probability  of  seeing  the  observed  number  of  yes  responses  or  more  (the  p-value 
using  an  upper  tail  rejection  region)  is  unlikely  (less  than  a  =  0.05)  under  the  assumption  that  the  following 
null  hypothesis  is  true: 

H2a0y.  ppfy  <  0.5 

We  may  similarly,  formalize  other  predictions  of  Hypotheses  H 1  a  and  H2a  as  well  as  the  predictions 
of  Hypotheses  Hlb  and  H2b.  We  show  each  of  these  formalizations  in  Table  5.5  in  the  next  section  while 
presenting  the  survey  results. 

These  formalizations  are,  however,  only  useful  if  we  can  compute  the  value  of  the  p-value  under  each  of 
them.  That  is,  we  must  have  a  formal  model  of  the  survey  responses  that  allows  us  to  compute  the  probability 
of  seeing  the  responses  we  observe  under  the  null  hypothesis.  We  now  turn  to  describing  such  a  model. 

5.3.2  Binomial  Model  of  the  Survey 

Each  null  hypotheses  that  we  test  is  an  assertion  about  the  probability  of  observing  either  a  yes  or  a  no 
response.  In  the  case  that  the  null  hypothesis  is  an  assertion  about  the  probability  of  observing  yes,  we 
consider  the  response  of  yes  to  be  a  success  outcome  representing  successfully  observing  the  response  about 
which  the  assertion  is.  We  may  collapse  the  responses  of  no  and  I  don’t  know  into  a  single  failure  outcome 
that  represents  failure  to  see  yes.  Likewise,  in  the  case  where  the  null  hypothesis  is  an  assertion  about  the 
probability  of  observing  no,  we  may  treat  no  as  a  success  outcome  while  treating  yes  and  I  don ’t  know, 
jointly,  as  a  failure  outcome. 
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By  using  only  two  outcomes  (success  and  failure),  we  may  model  each  survey  response  as  a  Bernoulli 
trial,  which  models  the  flipping  of  a  possibly  biased  coin.  The  degree  of  bias  determines  the  probability  of 
success,  which  models  the  probability  of  a  respondent  answering  the  question  in  the  manner  we  are  testing. 

We  model  all  the  responses  to  a  single  question  of  our  survey  collectively  as  a  series  of  identical  indepen¬ 
dent  Bernoulli  trials  with  each  respondent  corresponding  to  one  trial.  For  a  given  number  of  trials  and  proba¬ 
bility  of  success  for  each  trial,  the  binomial  distribution  provides  the  probability  of  seeing  each  possible  num¬ 
ber  of  successes.  (As  we  do  not  allow  the  same  individual  to  take  the  survey  more  than  once,  the  assumption 
of  identical  independent  trials  is  not  completely  satisfied  since  later  responses  are  from  a  smaller  pool  of  pos¬ 
sible  respondents  that  does  not  include  the  previous  respondents.  This  factor  results  in  the  hypergeometric 
distribution  being  a  more  accurate  model.  However,  since  we  are  drawing  our  participants  from  a  pool  much 
larger  than  the  sample  size,  the  binomial  distribution  provides  a  good  approximation.)  In  particular,  the  bino¬ 
mial  distribution  has  the  cumulative  distribution  function  F(x\ n,p)  =  Pr[X  <  x\  =  V)=0  (")p*(l  —  p)n~l 
where  x  is  the  number  of  successes,  n  the  number  of  trails,  and  p  is  the  probability  of  success. 

Our  null  hypotheses  are  assumptions  about  the  value  of  the  success  probability  p  (not  to  be  confused 
with  the  idea  of  a  p-value).  Using  the  binomial  distribution,  we  may  determine  the  probability  of  seeing 
the  responses  observed  under  the  null  hypothesis.  However,  we  are  actually  interested  in  the  p-value:  the 
probability  of  seeing  a  set  of  responses  at  least  as  extreme  as  the  observed  one  where  the  meaning  of  extreme 
depends  upon  whether  we  are  using  a  lower-tail  or  an  upper-tail  rejection  region. 

For  example,  consider  the  null  hypothesis  Hlaoy  that  ppfy  >  0.5.  We  will  reject  Hlaoy  using  a  lower- 
tail  rejection  region  if  its  p-value  is  less  than  a  =  0.05  where  the  p-value  is  the  probability  of  seeing  the 
observed  number  of  yes  responses  (success  outcomes)  or  fewer.  Under  our  binomial  model,  the  p-value  for 
Hla0y  is 


max  Pr [Y  <y  \  pp fy  =  p]  =  max  Pr [Y<y  \  Y  ~  B (n,p)\  =  max  F(x\n,p) 
p:0.5<p<l  p:0.5<p<l  p:0.5<p<l 

where  Y  ~  B(n.  p)  asserts  that  Y  is  a  random  variable  obeying  the  binomial  distribution  with  a  sample  size 
of  n  and  success  probability  of  p. 

We  may  use  F{x\ n,  0.5)  in  the  place  of  maxp:o.5<p<i  F(x\  n,p )  since  we  will  reject  the  null  hypothesis 
under  the  first  value  if  and  only  if  we  reject  it  under  the  second  value.  The  reason  for  this  equivalence  is  that 
F{x\ n,p)  is  an  decreasing  function  in  p  and  is  always  maximized  at  p  =  0.5  when  0.5  <  p. 

For  Hypothesis  H2a,  we  are  interested  in  the  null  hypothesis  that  ppfy  <  0.5  using  an  upper-tail  rejection 
region.  For  this  null  hypothesis,  the  p-value  equals 

max  1  —  F(x:  n,  p)  =  1  —  min  F(x:n,p ) 
p:0<p<0.5  p:0<p<0.5 

Similar  to  the  case  with  the  lower-tail  rejection  region,  minp:o<p<o.5  F(x;  n,  p)  is  equal  to  F(x:  n,  0.5)  since 
F(x\ n,p )  is  minimized  at  the  largest  available  value  of  p,  that  is,  0.5. 

The  ability  to  use  F(x\n,  0.5)  in  computing  the  p-value  for  both  lower-tail  and  upper-tail  rejection 
regions  justifies  the  convention  of  writing  the  null  hypotheses  using  an  equality  rather  than  an  inequality 
relation.  Whether  the  equality  is  short  hand  for  a  greater-than-equal  or  a  less-than-equal  relation  may  be 
inferred  from  the  alternative  hypothesis  paired  with  the  null  hypothesis.  We  will  adopt  this  convention  for 
the  remainder  of  this  work. 
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5.3.3  McNemar’s  Test 


To  test  hypotheses  Hlc  and  H2c,  we  must  compare  the  responses  across  scenarios.  These  responses  are  not 
independent  since  the  same  respondent  produces  responses  for  both  scenarios.  That  is,  the  responses  are 
produced  as  matched-pairs.  McNemar’s  test  provides  a  method  of  determining  from  these  matched-pairs 
the  effects  of  switching  between  the  two  scenarios  [McN47].  In  particular,  McNemar’s  test  examines  the 
number  of  pairs  where  the  response  switches  either  from  yes  to  no  or  from  no  to  yes.  The  test  approximates 
the  probability  of  the  number  of  switches  being  produced  by  two  dependent  draws  from  one  distribution.  If 
this  probability  is  small,  then  one  may  reject  the  null  hypothesis  that  switching  between  the  two  scenarios 
had  no  effect.  By  rejecting  this  null  hypothesis,  one  provides  evidence  for  the  alternative  hypothesis  that  the 
difference  between  the  two  scenarios  affected  the  responses. 

For  example,  for  hypothesis  H2c,  we  compare  the  responses  to  Question  Q4  across  the  Scenarios  Spf 
and  Spf.  We  use  the  null  hypothesis  that  whether  the  case  worker  employed  a  plan  for  treating  the  patient 
has  no  effect  on  whether  survey  participants  think  the  case  worker  violated  the  policy.  If  we  find  that  a 
large  number  of  respondents  have  different  responses  across  the  two  scenarios,  then  we  would  reject  the  null 
hypothesis  and  conclude  that  case  worker’s  planning  does  have  an  effect. 

We  test  Hypothesis  HI  a’  in  a  manner  similar  to  how  we  test  Hypothesis  HI  a.  However,  we  use  Ques¬ 
tion  Q3  instead  of  Question  Q4.  Analogously,  we  test  Hypotheses  Hlb’,  Hlc’,  H2a’,  H2b’,  and  H2c’  in  a 
manner  similar  to  Hypotheses  Hlb,  Hlc,  H2a,  H2b,  and  H2c,  respectively,  using  Question  Q3  in  place  of 
Question  Q4. 


5.4  Results 

While  we  only  offered  to  pay  the  first  200  respondents,  we  received  207  completed  surveys.  The  extra 
surveys  may  have  resulted  from  people  misunderstanding  the  instructions  and  not  collecting  payment. 

Of  these  completed  surveys,  we  excluded  20  respondents  for  missing  two  or  more  of  the  objective 
questions.  All  of  the  statistics  shown  in  this  section  are  calculated  from  the  remaining  187  respondents. 
Appendix  B.4  shows  the  same  statistics  for  all  207  respondents.  Including  the  20  excluded  respondents  does 
not  change  the  significance  of  any  of  our  hypothesis  tests. 

Table  5.4  shows  the  distributions  of  responses  for  each  question.  Informally  examining  the  tables  shows 
that  the  vast  majority  of  the  respondents  conform  to  Hypothesis  H2.  For  example,  177  (95%)  of  the  re¬ 
spondents  answered  Question  Q4  for  Scenario  Spf  with  the  answer  of  yes  as  predicted  by  Hypothesis  H2, 
whereas  only  eight  (4%)  answered  with  no  as  predicted  by  Hypothesis  HI.  However,  the  difference  is 
less  pronounced  for  Scenario  Spf  where  133  (71%)  match  Hypothesis  H2’s  prediction  of  no  and  45  (24%) 
matches  Hi’s  prediction  of  yes.  Interestingly,  31  (17%)  answered  yes  for  Scenario  S  pf  despite  both  hypothe¬ 
ses  predicting  no. 

Table  5.5  shows  the  hypothesis  tests  we  conducted  using  the  binomial  model.  The  top  half  of  the  table 
shows  tests  intended  to  disprove  Hypothesis  HI  while  the  bottom  half  shows  tests  attempting  to  confirm 
Hypothesis  H2.  Every  test  in  favor  of  Hypothesis  H2  obtains  statistical  significance.  Eight  of  the  16  tests 
against  Hypothesis  HI  obtain  statistical  significance.  The  eight  that  do  not  obtain  significance  are  the  cases 
where  the  two  hypotheses  agree.  In  every  case  where  the  two  disagree,  both  the  test  confirming  Hypothe¬ 
sis  H2  and  the  one  against  Hypothesis  HI  obtains  significance. 
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Ql:  Was 


Scenario 

Yes 

I  don't  know 

No 

Spf 

186  (99%) 

0  (00%) 

1  (01%) 

Spf 

184  (98%) 

1  (01%) 

2  (01%) 

Spf 

12  (06%) 

1  (01%) 

174  (93%) 

_Spf _ 

6  (03%) 

0  (00%) 

181  (97%) 

the  goal  treatment?  (question  with  an  objectively  correc 

Scenario 

Yes 

I  don’t  know 

No 

Spf 

187  (100%) 

0  (00%) 

0  (00%) 

Spf 

2(01%) 

0  (00%) 

185  (99%) 

Spf 

179  (96%) 

0  (00%) 

8  (04%) 

_ 

3  (02%) 

0  (00%) 

184  (98%) 

treatment  successful?  (question  with  an 

objectively  coi 

Scenario 

Yes 

I  don’t  know 

No 

Spf 

185  (99%) 

2  (01%) 

0  (00%) 

Spf 

183  (98%) 

1  (01%) 

3  (02%) 

Spf 

43  (23%) 

6  (03%) 

138  (74%) 

Jgf _ 

38  (20%) 

10  (05%) 

139  (74%) 

Q3:  Was  the  action  for  the  purpose? 


Scenario 

Yes 

I  don’t  know 

No 

Spf 

182  (97%) 

2  (01%) 

3  (02%) 

SPF 

177  (95%) 

2  (01%) 

8  (04%) 

Spf 

45  (24%) 

9  (05%) 

133  (71%) 

Spf 

31  (17%) 

9  (05%) 

147  (79%) 

Q4:  Was  the  policy  obeyed? 


Table  5.4:  Survey  Responses.  In  Scenario  Spf ,  the  case  worker’s  goal  was  treatment  and  the  treatment  was  successful; 
in  Spf,  the  goal  was  treatment  and  it  failed;  in  Spf,  the  goal  was  cost  reduction  and  the  treatment  succeeded;  and  in  S^, 
the  goal  was  cost  reduction  and  the  treatment  failed. 
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Testing  Alternative  Hypothesis  Null  Hypothesis 

p- Value  Signibcant? 

Against  Hla  ppfy  <  0.5  Ppfy  =  0.5 

Against  Hla  ppfn  >  0.5  pp fn  =  0.5 

Against  Hla  ppfy  <  0.5  Ppfy  =  0.5 

Against  Hla  ppfn  >  0.5  ppfn  =  0.5 

1  No 

1  No 

3.28889e-013  Yes 

3.527326e-009  Yes 

Against  Hla’  ppfy  <  0.5  Ppfy  =  0.5 

Against  Hla’  ppfn  >  0.5  Ppfn  =  0.5 

Against  Hla’  p'pfy  <0.5  p'pfy  =  0.5 

Against  Hla’  p'pfn  >  0.5  p'pfn  =  0.5 

1  No 

1  No 

3.08316e-014  Yes 

2.662347e-011  Yes 

Against  Hlb  pp fn  <  0.5  pp fn  =  0.5 

Against  Hlb  Ppfy  >  0.5  Ppfy  =  0.5 

Against  Hlb  ppfn  <  0.5  pp ~fn  =  0.5 

Against  Hlb  Ppfv  >  0.5  ppfy  =  0.5 

1.699463e-043  Yes 

6.090736e-041  Yes 

1  No 

1  No 

Against  Hlb’  p'^n  <  0.5  j/ ^  =  0.5 

Against  Hlb’  p'^  >  0.5  p'^  =  0.5 

Against  Hlb’  p'.^n  <  0.5  p'^n  =  0.5 

Against  Hlb’  >  0.5  =  0.5 

5.556827e-051  Yes 

2.570485e-049  Yes 

1  No 

1  No 

For  H2a  ppfy  >  0.5  ppfy  =  0.5 

For  H2a  ppfn  <  0.5  ppfn  =  0.5 

For  H2a  ppfy  >  0.5  ppfy  =  0.5 

For  H2a  pD ~fn  <  0.5  Pp fn  =  0.5 

9.461645e-048  Yes 

5.556827e-051  Yes 

6.090736e-041  Yes 

1.699463e-043  Yes 

For  H2a’  p'pfy  >  0.5  p'pfy  =  0.5 

For  H2a’  p'pfn  <  0.5  p'pfn  =  0.5 

For  H2a’  p' >0.5  p' -c  =0.5 

pfy  pfy 

For  H2a’  <0.5  p' -c  =0.5 

8.961588e-053  Yes 

5.097894e-057  Yes 

2.570485e-049  Yes 

5.556827e-051  Yes 

For  H2b  ppfn  >  0.5  ppfn  =  0.5 

For  H2b  ppfy  <  0.5  ppfy  =  0.5 

For  H2b  ppfn  >  0.5  ppfn  =  0.5 

For  H2b  pp-fy  <  0.5  ppFv  =  0.5 

3.527326e-009  Yes 

3.28889e-013  Yes 

7.078408e-016  Yes 

1.479279e-021  Yes 

For  H2b’  p'pfn  >  0.5  p'pfn  =  0.5 

For  H2b’  p'pfy  <  0.5  p'pfy  =  0.5 

For  H2b’  p'.-f  >  0.5  p'.-(  =  0.5 

For  H2b’  p'.-(  <0.5  p'.,  =0.5 

2.662347e-011  Yes 

3.08316e-014  Yes 

9.25205 le-012  Yes 

4.896385e-017  Yes 

Table  5.5:  Binomial  Hypothesis  Tests 
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Testing 


Alternative  Hypothesis  Null  Hypothesis 


Proving  H2a 

Ppfy 

> 

0.94 

Ppfy  — 

0.94 

Proving  H2a 

Ppfn 

< 

0.05 

Ppfn 

0.05 

Proving  H2a 

Ppfy 

> 

0.91 

Ppfy  = 

0.91 

Proving  H2a 

Ppfn 

< 

0.08 

Ppfn  = 

0.08 

Proving  H2a’ 

Ppfy 

> 

0.96 

Ppfy  — 

0.96 

Proving  H2a’ 

Ppfn 

< 

0.02 

Ppfn  = 

0.02 

Proving  H2a’ 

Ppfy 

> 

0.95 

Ppfy  ~ 

0.95 

Proving  H2a’ 

Ppfn 

< 

0.05 

Ppfn  = 

0.05 

Proving  H2b 

Ppfn 

> 

0.65 

Ppfn  = 

0.65 

Proving  H2b 

Ppfy 

< 

0.3 

Ppfy  = 

0.3 

Proving  H2b 

Ppfn 

> 

0.73 

Ppfn  = 

0.73 

Proving  H2b 

Ppfy 

< 

0.22 

Ppfy  = 

0.22 

Proving  H2b’ 

Ppfn 

> 

0.67 

Ppfn  = 

0.67 

Proving  H2b’ 

Ppfy 

< 

0.29 

Ppfy  = 

0.29 

Proving  H2b’ 

P-r 

1  pfn 

> 

0.68 

Ppfn  = 

0.68 

Proving  H2b’ 

_ ppfy 

< 

0.26 

_ ppfy  ~ 

0.26 

Table  5.6:  Extreme  Binomial  Hypothesis  Tests.  This  table  shows  the  hypothesis  test  using  the  most  extreme  proba¬ 
bility  for  which  statistical  significance  is  still  achieved  and  is  accurate  up  to  two  places  after  the  decimal  point. 


Since  the  results  of  the  hypothesis  testing  where  so  strongly  in  favor  of  Hypothesis  H2  using  the  proba¬ 
bility  of  0.5  as  the  null  hypothesis,  we  decided  to  calculate  the  most  extreme  probabilities  that  still  obtains 
significance.  For  testing  that  a  probability  is  less  than  a  value  (lower  tail  rejection  region),  the  most  extreme 
value  is  the  minimal  value,  whereas  it  is  the  maximum  value  for  testing  that  a  probability  is  greater  than  a 
value  (upper  tail  rejection  region).  Table  5.6  shows  these  probabilities  conservatively  calculated  up  to  0.01 
away  from  the  true  extreme  probability.  For  example,  the  bottom  row  shows  p'4y  is  less  than  0.26  with  sta¬ 
tistical  significance  but  not  less  than  0.25  with  statistical  significance.  (This  does  not  imply  that  p'Ay  >  0.25 
with  statistical  significance.)  As  these  probabilities  are  more  extreme  for  Hypotheses  H2a  and  H2a’  than 
Hypotheses  H2b  and  H2b’,  H2a  and  H2a’  appear  to  be  more  accurate.  However,  as  we  added  these  statistics 
to  the  analysis  after  having  conducted  the  survey,  they  may  suffer  from  confirmation  bias. 

Table  5.7  shows  the  results  of  using  McNemar’s  Test  to  compare  the  distribution  of  responses  to  one 
question  across  two  scenarios.  For  example,  the  last  row  compares  the  distribution  producing  responses 
to  Question  Q3  for  Scenario  Spf  to  that  producing  responses  for  Scenario  S=p  McNemar’s  Test  shows 
that  the  differences  in  the  observed  responses  are  statistically  significant.  This  result  indicates  that  the  two 
distributions  differ  as  predicted  by  Hypothesis  H2c’.  On  the  other  hand,  the  fourth  line  of  Table  5.7  shows 
that  the  responses  for  Question  Q3  do  not  differ  significantly  across  Scenarios  Spf  and  S=f.  This  result 
differs  from  Hypothesis  HI,  which  predicts  that  people  would  answer  the  question  differently  across  the 
two  scenarios.  McNemar’s  Test  validates  all  four  predictions  of  Hypothesis  H2.  It  validates  one  of  the 
predictions  of  Hypothesis  HI .  The  statistic  could  not  be  computed  in  one  case  as  the  data  was  too  sparse  for 
the  calculation. 
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Testing 

Question 

Scenarios 

p- Value 

Significant? 

For  Hlc 

Q4 

Spf  vs.  Spf 

NaN 

No 

For  Hlc 

Q4 

Spf  vs.  S^ 

0.02674664 

Yes 

For  Hlc’ 

Q3 

Spf  vs.  Spf 

0.3916252 

No 

For  Hlc’ 

Q3 

Spf  vs.  Spf 

0.3951831 

No 

For  H2c 

Q4 

Spf  vs.  Spf 

1.020173e-029 

Yes 

For  H2c 

Q4 

Spf  vs.  Spf 

3.112267e-031 

Yes 

For  H2c’ 

Q3 

Spf  vs.  Spf 

5.1 8685  le-03 1 

Yes 

For  H2c’ 

Q3 

Spf  vs.  S^ 

8.40055e-031 

Yes 

Table  5.7:  McNemar’s  Tests  Across  Scenarios 


Scenario 

Questions 

p- Value 

Significant? 

Spf 

Q4  vs.  Q3 

NaN 

No 

Spf 

Q4  vs.  Q3 

NaN 

No 

Spf 

Q4  vs.  Q3 

0.3843414 

No 

Spf 

Q4  vs.  Q3 

0.2239329 

No 

Table  5.8:  McNemar’s  Tests  Across  Questions 


We  were  surprised  to  see  the  degree  of  difference  between  how  people  answered  Questions  Q4  and  Q3. 
For  example,  for  Scenario  Spf,  79%  of  respondents  answered  Question  Q5  with  no  whereas  only  74% 
answered  Q3  with  no  despite  our  belief  that  both  questions  should  be  answered  identically  (see  Table  5.4).  To 
test  whether  these  differences  are  statistically  significant,  we  used  McNemar’s  test  to  compare  the  responses 
to  these  two  questions  within  a  single  scenario.  Table  5.8  shows  the  results.  None  of  the  tests  showed  a 
statistically  significant  difference  in  how  the  questions  were  answered,  but  two  of  the  tests  failed  to  produce 
a  numeric  p- value. 


5.5  Limitations  of  Study 

Various  factors  affect  the  validity  of  our  conclusions.  We  discuss  each  of  them  below. 

By  mentioning  whether  the  auditee  is  performing  the  action  as  part  of  a  plan,  it  forces  the  participant 
to  consider  the  relationship  between  purposes  and  plans.  It  is  possible  that  participants  not  primed  to  think 
about  planning  would  substantiate  H 1 . 

The  use  of  Mechanical  Turk  raises  questions  about  how  representative  our  population  sample  is.  Ross 
et  al.  look  at  the  demographics  of  Mechanical  Turk  workers  and  find  that  among  U.S.  workers,  a  dispro¬ 
portionate  number  are  female  [RIS+10].  However,  Berinsky,  Huber,  and  Lenz  find  that  Mechanical  Turk 
studies  are  as  representative,  if  not  more  representative,  than  convenience  samples  commonly  used  in  re¬ 
search  [BHL11].  While  we  attempted  to  limit  our  sample  to  adults  in  the  United  States,  Mechanical  Turk’s 
ability  to  verify  the  qualification  criteria  is  limited.  Even  given  a  representative  pool  of  Mechanical  Turk 
workers  for  our  sampling  frame,  our  sample  may  be  biased  as  the  participants  selected  to  take  our  survey 
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rather  than  us  having  randomly  selected  them  from  the  pool. 

In  many  cases,  lawyers  write  privacy  policies  using  the  jargon  of  their  profession.  We  have  not  studied 
whether  lawyers  or  others  involved  in  the  writing  or  enforcing  of  privacy  policies  (e.g.,  auditors)  understand 
purpose  restrictions  in  a  manner  different  from  the  population  at  large,  which  we  attempted  to  sample  for  this 
survey.  While  surveying  such  professions  is  interesting  future  work,  we  believe  that  the  lay  understanding 
is  important  since  public-facing  privacy  policies  often  contain  purpose  restrictions. 

The  use  of  paid  but  unmonitored  participants,  also  raises  concerns  that  participants  might  provide  arbi¬ 
trary  answers  to  speed  through  the  questionnaire.  Kittur,  Chi,  and  Suh  present  experimental  results  of  using 
Mechanical  Turk  for  user  studies  [KCS08].  They  conclude  that  Mechanical  Turk  can  be  useful  if  one  elimi¬ 
nates  such  spurious  submissions  by  including  questions  with  known  answers  and  rejecting  participants  who 
fail  to  correctly  answer  these  questions.  We  follow  this  protocol  by  using  Questions  Q1  and  Q2  to  force  the 
participant  to  read  the  scenarios  and  by  notifying  survey  participants  that  we  may  withhold  payment  if  they 
answer  arbitrarily.  Answering  the  remaining  questions  (Q3,  Q4,  and  Q5)  becomes  fairly  easy  after  having 
correctly  answered  Questions  Q1  and  Q2.  By  making  the  additional  work  required  for  meaningful  partici¬ 
pation  small,  we  hope  to  have  reduced  arbitrary  responses.  However,  by  threatening  to  withhold  payment, 
we  may  have  increased  the  demand  effect,  the  tendency  of  participants  to  provide  the  answers  they  believe 
the  surveyor  would  like  to  observe  as  opposed  to  their  honest  opinions  (see,  e.g.,  [Orn09]). 

Some  respondents  might  answer  later  questions  in  a  manner  consistent  with  their  answers  to  earlier 
questions  despite  having  differing  opinions.  This  bias  could  arise  since  some  of  the  differences  between 
questions  may  appear  trivial,  especially  since  we  made  each  question  similar  to  the  others  to  reduce  con¬ 
founding  factors.  As  no  scenario  has  the  same  answers  to  both  Questions  Q1  and  Q2  together  as  any  other 
scenario,  we  hope  to  have  reduced  this  bias. 

Nonattitudes  occur  when  a  participant  arbitrarily  selects  a  response  since  they  do  not  have  an  opinion  on 
a  question.  To  reduce  the  effect  of  nonattitudes,  we  included  the  option  of  a  I  don’t  know  response. 

We  do  not  claim  that  the  questionnaire  tests  all  relevant  factors  (i.e.,  we  do  not  claim  high  content 
validity).  Indeed,  we  did  not  test  some  factors  that  we  suspect  may  affect  respondents  such  as  whether  the 
policy  is  perceived  as  good  or  bad. 

Another  concern  is  that  respondents  may  change  their  opinions  over  time.  We  did  not  perform  a  follow¬ 
up  study  to  determine  how  reliable  our  survey  is  over  time. 

It  is  also  possible  that  our  survey  questions  are  not  understood  by  the  respondents  in  a  manner  consistent 
with  testing  the  meaning  of  purpose.  The  various  forms  of  validity  discussed  below  attempt  to  determine 
whether  our  survey  actually  measured  the  concepts  in  which  we  are  interested. 

We  believe  that  our  survey  has  face  validity.  That  is,  we  believe  that  our  questions  are,  on  their  face, 
well  worded  for  testing  our  hypotheses. 

Including  both  Questions  Q3  and  Q4  not  only  allowed  us  to  compare  the  truth  of  Hypothesis  Hla  to  HI  a’ 
(and  likewise  with  the  other  unprimed-primed  pairs  of  hypotheses),  but  also  to  see  the  effects  of  the  changing 
the  wording  of  the  questions.  As  the  respondents  typically  answered  these  two  questions  in  the  same  manner, 
we  believe  that  our  results  are  not  overly  influenced  by  the  wording  of  the  questions  and  pertain  to  the 
underlying  concepts.  That  is,  we  believe  our  survey  has  convergent  validity.  However,  that  some  respondents 
varied  their  responses  across  Questions  Q4  and  Q3  within  a  single  scenario  deserves  further  investigation. 

As  we  know  of  no  previous  empirical  research  addressing  the  issues  tested  by  our  study,  we  cannot 
compare  our  results  to  those  already  proved  about  the  meaning  of  purpose.  Thus,  we  cannot  that  argue  that 
our  survey  has  construct  validity  by  showing  that  it  agrees  with  previous  results.  However,  prior  work  has 
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studied  how  people  assign  goals  to  actions  [BTS06,  BTS07,  BST09,  BST11].  These  studies  have  found 
that  planning-based  models  similar  to  ours  predict  the  goals  people  assign  to  animated  characters  better  than 
heuristics. 

A  survey  respondent  may  confuse  the  concepts  we  are  testing  with  related  ones  reducing  the  divergent 
validity  of  our  survey.  For  example,  rather  than  actually  answer  Question  Q4,  they  may  instead  provide 
the  answer  to  the  following  question:  “Was  the  case  worker’s  action  consistent  with  someone  seeking  treat¬ 
ment  for  the  patient?”  Such  confusion  may  explain  some  of  the  unexpected  variation  in  responses  between 
Questions  Q3  and  Q4. 

The  ultimate  goal  of  our  work  is  to  determine  how  people  think  policies  involving  the  concept  of  purpose 
should  be  enforced.  Our  survey  is  detached  from  any  actual  enforcement.  Respondents  might  behave 
differently  than  their  responses  suggest  given  the  task  of  actually  enforcing  a  policy.  They  may  also  differ 
from  their  responses  in  their  feelings  if  they  were  actually  subject  to  such  a  policy.  Our  survey  is  most 
similar  to  the  respondent  acting  as  a  neutral  third-party  or  judge  in  a  dispute  over  the  meaning  of  a  policy. 
However,  even  in  such  a  role,  the  respondent’s  behavior  may  differ  from  that  suggested  from  his  responses. 
Ideally,  our  survey  will  predict  with  a  high  degree  of  accuracy  how  the  respondents  would  behave  in  each  of 
these  three  roles  (policy  enforcer,  policy  subject,  or  neutral  third-party)  establishing  that  our  survey  actually 
corresponds  to  the  behavior  we  wish  to  study  (i.e.,  has  criterion  validity).  However,  we  have  not  established 
this  form  of  validity. 

5.6  Discussion 

The  results  shown  above  provide  evidence  in  favor  of  defining  an  action  to  be  for  a  purpose  if  and  only  if  an 
agent  performed  the  action  as  part  of  a  plan  for  furthering  that  purpose  (Hypothesis  H2).  The  binomial  tests 
provide  strong  evidence  against  defining  an  action  to  be  for  a  purpose  if  and  only  if  that  action  furthered  the 
purpose  (Hypothesis  HI).  McNemar’s  test  provides  some  support  for  Hypothesis  HI.  Indeed,  informally 
examining  the  response  distributions  (Table  5.4),  it  appears  Hypothesis  HI  does  accurately  model  a  small 
minority  of  respondents.  However,  Hypothesis  H2  appears  to  accurately  model  a  much  larger  number 
of  respondents.  For  these  reasons,  we  conclude  that  Hypothesis  H2  provides  a  superior  model  to  that  of 
Hypothesis  HI. 

Nevertheless,  the  relative  strength  of  Hypothesis  H2a  compared  to  Hypothesis  H2b  suggests  that  some 
people  feel  that  an  action  being  for  a  purpose  is  sufficient  but  not  necessary  for  an  action  to  be  for  a  purpose. 
Examining  free-form  responses  to  Question  Q5  suggests  that  some  people  feel  that  the  action  of  sharing  a 
record  is  for  the  purpose  of  treatment  since  it  is  the  same  action  that  would  be  taken  had  the  case  worker 
been  planning  for  treatment.  This  suggests  a  third  class  of  hypotheses: 

H3  The  auditee  obeys  the  (purpose)  restriction  if  and  only  if  the  auditee  performed  an  action  that  a  hypo¬ 
thetical  agent  would  take  had  it  planned  for  the  purpose. 

H3’  An  action  is  for  a  purpose  if  and  only  if  that  action  is  the  action  a  hypothetical  agent  would  take  had  it 
planned  for  the  purpose. 

These  hypotheses  place  strictly  weaker  restrictions  on  the  auditee’s  behavior  consistent  with  the  idea  that 
H2  is  sufficient  but  not  necessary.  Interestingly,  they  match  the  approximations  our  algorithm  makes  in 
attempting  to  enforce  Hypothesis  H2.  Unfortunately,  by  not  mentioning  whether  the  case  worker’s  choice 
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to  forward  the  record  in  Scenarios  Spf  and  Spf  is  consistent  with  the  actions  of  a  hypothetical  agent  planning 
for  treatment,  we  cannot  test  these  hypotheses  using  the  conducted  survey. 
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Chapter  6 


Multiple  Purposes  and  Limitations 


6.1  Introduction 

So  far,  our  formalism  allows  our  hypothetical  agent  to  consider  only  a  single  purpose.  However,  auditees 
may  perform  an  action  for  more  than  one  purpose.  In  many  cases,  the  auditor  may  simply  ignore  any  action 
that  is  not  governed  by  the  privacy  policy  and  not  relevant  to  the  plans  the  auditee  is  employing  that  uses 
governed  actions. 

In  the  physician  example  of  Chapter  2,  the  physician  already  implicitly  considered  many  other  purposes 
before  even  seeing  this  current  patient.  For  example,  the  physician  presumably  performed  many  actions  not 
mentioned  in  the  model  in  between  taking  the  X-ray,  sending  it,  and  making  a  diagnosis,  such  as  going  on 
a  coffee  break.  As  these  actions  are  not  governed  by  the  privacy  policy  and  neither  improves  nor  degrades 
the  diagnosis  even  indirectly,  the  auditor  may  safely  ignore  them.  Thus,  our  semantics  can  handle  multiple 
purposes  in  this  limited  fashion. 

However,  in  other  cases,  the  interactions  between  purposes  become  important.  Below  we  discuss  two 
complementary  ways  that  an  auditee  can  consider  multiple  purposes  that  produce  interactions.  In  the  first, 
the  auditee  considers  one  purpose  after  another.  In  the  second,  the  auditee  attempts  to  optimize  for  multiple 
purposes  simultaneously.  We  find  that  our  semantics  may  easily  be  extended  to  handle  the  first,  but  difficul¬ 
ties  arise  for  the  second.  We  end  the  section  by  considering  what  features  a  formalism  would  need  to  handle 
simultaneous  consideration  of  purposes  and  the  challenges  they  raise  for  auditing. 

6.2  Sequential  Consideration 

Yahoo  !’s  privacy  policy  states  that  they  will  not  contact  children  for  the  purpose  of  marketing  [Yah  10a]. 
Suppose  Yahoo!  decides  to  change  the  name  of  games  .  yahoo  .  com  to  fun  .  yahoo  .  com  because  they 
believe  the  new  name  will  be  easier  to  market.  They  notify  users  of  games  .  yahoo  .  com,  including  chil¬ 
dren,  of  the  upcoming  change  so  that  they  may  update  their  bookmarks. 

In  this  example,  the  decision  to  change  names,  made  for  marketing,  causes  Yahoo!  to  contact  children. 
However,  we  do  not  feel  that  this  is  a  violation  of  Yahoo !’s  privacy  policy.  A  decision  made  for  marketing 
altered  the  expected  future  of  Yahoo!  in  such  a  way  that  customer  service  would  suffer  if  Yahoo!  did  nothing. 
Thus,  to  maintain  good  customer  service,  Yahoo!  made  the  decision  to  notify  users  without  further  consid¬ 
eration  of  marketing.  Since  Yahoo!  did  not  consider  the  purpose  of  marketing  while  making  this  decision. 
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contacting  the  children  was  not  for  marketing  despite  Yahoo!  considering  the  implications  of  changing  the 
name  for  marketing  while  making  its  decision  to  contact  children. 

Bratman  describes  such  planning  in  his  work  formalizing  intentions  in  the  Belief-Desire-Intention  (BDI) 
model  [Bra87].  He  views  it  as  a  sequence  of  planning  steps  in  which  the  intention  to  act  (e.g.,  to  change 
the  name)  at  one  step  may  affect  the  plans  formed  at  later  steps.  In  particular,  each  step  of  planning  starts 
with  a  model  of  the  environment  that  is  refined  by  the  intentions  formed  by  each  of  the  previous  planning 
steps.  The  step  then  creates  a  plan  for  a  purpose  that  further  refines  the  model  with  new  intentions  resulting 
from  this  plan.  A  purpose  associated  with  a  previous  step  constrains  the  choices  available  at  a  later  step  of 
planning.  Thus,  a  purpose  associated  with  a  previous  step  may  affect  the  plan  formed  in  a  later  step  for  a 
different  purpose.  We  adopt  the  stance  that  an  action  selected  at  a  step  is  for  the  purpose  optimized  at  that 
step.  However,  the  action  is  not  also  for  other  previous  purposes  even  if  the  constraints  created  previously 
by  planning  for  those  purposes  affect  which  action  the  agent  chooses  at  the  current  step. 

Baker  et  al.  formalizes  a  simple  form  of  sequential  planning  for  an  MDP  model  with  multiple  goals,  but 
they  do  not  support  intentions  from  previous  goals  affecting  future  goals  [BTS07,  BST09]. 

6.3  Simultaneous  Consideration 

At  other  times,  an  auditee  might  consider  more  than  one  purpose  in  the  same  step.  For  example,  the  physi¬ 
cian  may  have  to  both  provide  quality  treatment  and  to  respect  the  patient’s  financial  concerns.  In  this  case, 
the  physician  may  not  be  able  to  simultaneously  provide  the  highest  quality  care  at  the  lowest  price.  The  two 
competing  concerns  must  be  balanced  and  the  result  may  not  maximize  the  satisfaction  of  either  of  them. 

The  traditional  way  of  modeling  the  simultaneous  optimization  of  multiple  rewards  is  to  combine  them 
into  a  single  reward  using  a  weighted  average  over  the  rewards.  Each  reward  would  be  weighted  by  how 
important  it  is  to  the  auditee  performing  the  optimization.  This  amalgamation  of  the  various  purpose  rewards 
makes  it  difficult  to  determine  for  which  purpose  various  actions  are  selected. 

One  possibility  is  to  analyze  the  situation  using  counterfactual  reasoning  (see,  e.g.,  [Mac74]).  For  ex¬ 
ample,  given  that  the  auditee  performed  an  action  a  while  optimizing  a  combination  of  purposes  p\  and 
P2,  the  auditor  could  ask  if  the  auditee  would  have  still  performed  the  action  a  even  if  the  auditee  had  not 
considered  the  purpose  p\  and  had  only  optimized  the  purpose  p-2-  If  not,  than  the  auditor  could  determine 
that  the  action  was  for  p\ .  However,  as  the  next  example  shows,  such  reasoning  is  not  sufficient  to  determine 
the  purposes  of  the  actions. 

To  show  the  generality  of  purposes,  we  consider  an  example  involving  travel  reimbursement  instead 
of  privacy.  Consider  a  Philadelphian  who  needs  to  go  to  New  York  City  for  a  business  meeting  with  his 
employer  and  is  invited  to  give  a  lecture  at  a  conference  in  Washington,  D.C.,  with  his  travel  expenses 
reimbursed  by  the  conference.  He  could  drive  to  either  New  York  or  Washington  (modeled  as  the  actions 
driveNY  and  driveDC,  respectively).  However,  due  to  time  constraints  he  cannot  drive  to  both  of  them. 
To  attend  both  events,  he  needs  to  fly  to  both  (modeled  as  actions  flyNY  and  fly  DC).  As  flying  is  more 
expensive,  both  driving  actions  receive  a  higher  reward  than  flying  (2  instead  of  1),  but  flying  is  better  than 
not  going  (0).  Figure  6.1  models  the  traveler’s  environment. 

Given  these  constraints,  he  decides  to  fly  to  both  only  to  find  auditors  at  both  events  scrutinizing  his 
decision.  For  example,  an  auditor  working  for  the  conference  could  find  that  his  flight  to  Washington  was 
not  for  the  lecture  since  the  traveler  would  have  driven  had  it  not  been  for  work.  If  the  conference’s  policy 
requires  that  reimbursed  flights  are  only  for  the  lecture,  the  auditor  might  deny  reimbursement.  However, 
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Figure  6.1:  Model  of  a  traveler  deciding  whether  to  fly  or  drive.  Since  every  transition  is  deterministic,  we  represent 
each  as  a  single  arrow.  Each  is  labeled  with  the  action  name,  the  rewards  for  business  and  the  rewards  for  lecturing  in 
that  order.  Self-loops  of  zero  reward  are  not  shown  including  all  those  labeled  with  the  do-nothing  action  stop. 


the  employer  seems  even  less  likely  to  reimburse  the  traveler  for  his  flight  to  Washington  since  the  flight  is 
redundant  for  getting  to  New  York. 

However,  under  the  semantics  discussed  above,  each  flight  would  be  for  both  purposes  since  only  when 
the  traveler  considers  both  does  he  decide  to  take  either  flight.  While  having  the  conference  reimburse  the 
traveler  for  his  flight  to  Washington  seems  reasonable,  the  idea  that  they  should  also  reimburse  him  for  his 
flight  to  New  York  appears  counterintuitive. 

Our  approach  of  sequential  planning  also  cannot  explain  this  example.  To  plan  sequentially,  the  traveler 
must  consider  one  of  the  two  events  first.  If,  for  example,  he  considers  New  York  first,  he  will  decide  to 
drive  to  New  York  and  then  decline  the  invitation  to  Washington.  Only  by  considering  both  events  at  once, 
does  he  decide  to  fly  and  is  able  to  satisfy  both  purposes. 

We  believe  resolving  this  conflict  requires  extending  our  semantics  to  consider  requirements  that  an 
action  be  for  a  purpose  (as  opposed  to  not  for  or  only  for).  Furthermore,  we  believe  that  the  optimization 
of  combinations  of  purposes  does  not  accurately  model  human  planning  with  multiple  purposes.  Intuitively, 
the  traveler  selects  fly  DC  not  for  work  but  also  not  only  for  the  conference.  Rather  fly  DC  seems  be  for  the 
conference  under  the  constraint  that  it  must  not  prevent  the  traveler  from  attending  the  meeting.  In  the  next 
section,  we  consider  the  possibility  of  modeling  human  planning  more  accurately. 

Handling  multiple  purposes  is  likely  to  lead  to  computational  intractability.  Blokpoel,  Kwisthout,  van 
der  Weide,  and  Rooij  examine  goal  inference  for  an  MDP-like  model  with  multiple  goals  [BKvdWvRIO]. 
They  find  that  supporting  numerous  goals  (or  purposes)  results  in  intractability.  Given  that  their  model  does 
not  handle  intentions,  we  suspect  that  the  approach  we  envision  will  be  even  more  complex. 


6.4  Modeling  Human  Planning 

MDPs  and  POMDPs  are  useful  for  automated  planning  and  producing  optimal  strategies  for  informing 
operating  procedures  as  discussed  in  Chapter  4.  However,  they  are  not  specialized  for  modeling  planning 
by  humans  who  plan  in  a  significantly  different  manner.  First,  while  planning,  humans  tend  not  to  combine 
their  purposes  into  a  single  reward  or  utility  function  as  suggested  above,  but  rather  maintain  a  vector  of 
reward  functions  each  corresponding  to  a  separate  purpose  [Sim55].  Second,  except  while  playing  well 
defined  games,  constructing  an  MDP  or  POMDP  model  of  the  environment  is  a  costly  process  involving  the 
collection  of  information  and  its  analysis.  Indeed,  the  auditee  will  need  a  plan  to  construct  his  model  [Sti61], 
but  such  a  planning  problem  is  even  more  complex  leading  to  an  infinite  digression  [Sel02].  Third,  while  the 
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algorithms  used  for  MDP  optimization  are  conceptually  simple,  they  involve  numeric  calculations  beyond 
the  comfort  of  humans  [Sim55];  POMDP  optimization  is  difficult  even  for  computers  [MadOO].  Fourth, 
experiments  suggest  that  humans  tend  not  to  use  a  discounting  factor  such  as  7,  but  rather  discount  in  a 
more  ad  hoc  manner  resembling  hyperbolic  discounting  [Ain92]. 

For  these  reasons,  others  have  searched  for  more  tailored  models  of  human  planning  [Sim55,  GS02]. 
Simon  proposed  to  model  humans  as  having  bounded  rationality  to  account  for  their  limitations  and  their 
lack  of  information  [Sim55].  Work  on  formalizing  bounded  rationality  has  resulted  in  a  variety  of  planning 
principles  ranging  from  the  systematic  (e.g.,  Simon’s  satisficing)  to  the  heuristic  (see,  e.g.,  [Gig02]).  How- 
ever,  “[a]  comprehensive,  coherent  theory  of  bounded  rationality  is  not  available”  [Sel02,  pl4]  and  there 
still  is  “a  significant  amount  of  unpredictability  in  how  an  animal  or  a  human  being  will  undertake  to  solve 
a  problem”  such  as  planning  [DKP96,  p40]. 

We  view  creating  semantics  more  closely  tied  to  human  planning  interesting  future  work.  Flowever, 
modeling  human  planning  may  prove  complex  enough  to  justify  accepting  the  imperfections  of  semantics 
such  as  ours  or  even  heuristic  based  approaches  for  finding  violations  such  as  the  query  intrusion  model 
discussed  above  [AKSX02]. 

Despite  these  difficulties,  one  could  look  for  discrepancies  between  a  semantics  of  purpose  restrictions 
and  experimental  results  on  planning.  In  this  manner  one  could  judge  how  closely  a  semantics  approximates 
human  planning  in  the  ways  relative  to  purpose  restrictions. 

In  particular,  our  semantics  appears  to  hold  human  auditees  to  too  high  of  a  standard:  they  are  unlikely  to 
always  be  able  to  pick  the  optimal  strategy  for  a  purpose.  When  enforcing  an  exclusivity  rule,  this  strictness 
could  result  in  the  auditor  investigating  some  auditees  who  honestly  planned  for  the  only  allowed  purpose, 
but  failed  to  find  the  optimal  policy.  While  such  investigations  would  be  false  positives,  they  do  have  the 
pleasing  side-effect  of  highlighting  areas  in  which  an  auditee  could  improve  his  planning. 

In  the  case  of  enforcing  prohibitive  rules,  this  strictness  could  cause  the  auditor  to  miss  some  violations 
that  do  not  optimize  the  prohibited  purpose,  but,  nevertheless,  are  for  the  purpose.  The  additional  checks 
proposed  at  the  end  of  Section  2.2.3  could  be  useful  for  detecting  these  violations:  if  the  auditee’s  actions  are 
not  consistent  with  a  strategy  that  optimizes  any  of  the  allowed  purposes  but  does  improve  to  some  degree 
the  prohibited  purpose,  the  actions  may  warrant  extra  scrutiny. 

Prior  work  on  goal  inference  has  used  more  relaxed  criterion  than  optimality  to  account  for  a  human’s 
inability  to  optimize  (e.g.,  [BST11,  RG11]).  These  relaxations  amount  to  allowing  the  auditee  to  deviate 
from  the  optimal  strategy  with  a  probability  proportional  to  the  suboptimality  of  the  deviation  (or  an  approx¬ 
imation  of  the  suboptimality  in  the  case  of  [BST1 1]).  We  believe  that  an  accurate  model  would  also  have  to 
allow  for  deviations  due  to  the  difficulty  of  determining  the  consequences  of  an  action. 

While  our  semantics  is  limited  by  our  understanding  of  human  planning,  it  still  reveals  concepts  crucial 
to  the  meaning  of  purpose.  Ideas  such  as  planning  and  non-redundancy  will  guide  future  investigations  on 
the  topic. 
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Chapter  7 


Related  Work 


7.1  Applying  our  Formalism  to  Past  Methods 

Past  methods  of  enforcing  purpose  restrictions  have  not  provided  a  means  of  assigning  purposes  to  sequences 
of  actions.  Rather,  they  presume  that  the  auditor  (or  someone  else)  already  has  a  method  of  determining 
which  behaviors  are  for  a  purpose.  In  essence,  these  methods  presuppose  that  the  auditor  already  has  the 
set  of  allowed  behaviors  nbehv(rp)  for  the  purpose  p  that  he  is  enforcing.  These  methods  differ  in  their 
intensional  representations  of  the  set  nbehv(rp).  Thus,  some  may  represent  a  given  set  exactly  while  others 
may  only  be  able  to  approximate  it.  These  differences  mainly  arise  from  the  different  mechanisms  they 
use  to  ensure  that  the  auditee  only  exhibits  behaviors  from  nbehv(rp).  We  use  our  semantics  to  study  how 
reasonable  these  approximations  are. 

Research  Efforts.  Byun  et  al.  use  role-based  access  control  [San96]  to  present  a  methodology  for  or¬ 
ganizing  privacy  policies  and  their  enforcement  [BBL05,  BL08,  NBL+10].  They  associate  purposes  with 
sensitive  resources  and  with  roles,  and  their  methodology  only  grants  the  user  access  to  the  resource  when 
the  purpose  of  the  user’s  role  matches  the  resource’s  purpose.  The  methodology  does  not,  however,  explain 
how  to  determine  which  purposes  to  associate  with  which  roles.  Furthermore,  a  user  in  a  role  can  perform 
actions  that  do  not  fit  the  purposes  associated  with  his  role  allowing  him  to  use  the  resource  for  a  purpose 
other  than  the  intended  one.  Thus,  their  method  is  only  capable  of  enforcing  policies  when  there  exists  some 
subset  A  of  the  set  of  actions  A  such  that  nbehv(rp)  is  equal  to  the  set  of  all  interleavings  of  A  with  S 
of  finite  but  unbounded  length  (i.e.,  nbehv(rp)  =  (5  x  A)*).  The  subset  A  corresponds  to  those  actions 
that  use  a  resource  with  the  same  purpose  as  the  auditee’s  role.  Despite  these  limitations,  their  method  can 
implement  the  run-time  enforcement  used  at  some  organizations,  such  as  a  hospital  that  allows  physicians 
access  to  any  record  to  avoid  denying  access  in  time-critical  emergencies.  However,  it  does  not  allow  the 
fine-grain  distinctions  used  during  post-hoc  auditing  done  at  some  hospitals  to  ensure  that  physicians  do  not 
abuse  their  privileges.  Group-centric  access  control  has  similar  advantages  and  limitations  offers  similar 
advantages  but  suffers  from  the  same  shortcomings  [KSNW09]. 

Al-Fedaghi  uses  the  work  of  Byun  et  al.  as  a  starting  point  but  concludes  that  rather  than  associating 
purposes  with  roles,  one  should  associate  purposes  with  sequences  of  actions  [AF07].  Influenced  by  Al- 
Fedaghi,  Jafari  et  al.  adopt  a  similar  position  calling  these  sequences  workflows  [JSNS09].  The  set  of 
workflows  allowed  for  a  purpose  p  corresponds  to  nbehv(rp).  They  do  not  provide  a  formal  method  of 
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determining  which  workflows  belong  in  the  allowed  set  leaving  this  determination  to  the  intuition  of  the 
auditor.  They  do  not  consider  probabilistic  transitions  and  the  intuition  they  supply  suggests  that  they  would 
only  include  workflows  that  successfully  achieve  or  improve  the  purpose.  Thus,  our  approach  appears  more 
lenient  by  including  some  behaviors  that  fail  to  improve  the  putpose.  As  shown  in  Chapter  5,  this  leniency 
is  key  to  capturing  the  semantics  of  purpose  restrictions.  An  auditor  could  encode  a  workflow  in  the  state 
of  the  environment  to  get  results  similar  to  Al-Fedaghi’s  or  Jafari  et  al.’s  results  while  using  Contextual 
Role-Based  Access  Control  [MF03]  or  Situation-Based  Access  Control  [PBDD08]. 

Others  have  adopted  a  hybrid  approach  allowing  the  roles  of  an  auditee  to  change  based  on  the  state  of 
the  system  [PGY08,  EKWB11].  These  dynamic  roles  act  as  a  level  of  indirection  assigning  an  auditee  to  a 
state.  This  indirection  effectively  allow  role -based  access  control  to  simulate  the  workflow  methods  to  be 
just  as  expressive. 

Agrawal  et  al.  propose  a  methodology  called  Hippocratic  databases  for  protecting  the  privacy  of  subjects 
of  a  database  [AKSX02].  They  propose  to  use  a  query  intrusion  model  to  enforce  privacy  polices  governing 
purposes.  Given  a  request  for  access  and  the  purpose  for  which  the  requester  claims  the  request  is  made,  the 
query  intrusion  model  compares  the  request  to  previous  requests  with  the  same  purpose  using  an  approach 
similar  to  intrusion  detection.  If  the  request  is  sufficiently  different  from  previous  ones,  it  is  flagged  as  a 
possible  violation.  While  the  method  may  be  practical,  it  lacks  soundness  and  completeness.  Furthermore, 
by  not  being  semantically  motivated,  it  provides  no  insight  into  the  semantics  of  purpose.  To  avoid  false 
positives,  the  set  of  allowed  behaviors  nbehv(rp)  would  have  to  be  small  or  have  a  pattern  that  the  query 
intrusion  model  could  recognize. 

Jif  is  a  language  extension  to  Java  designed  to  enforce  requirements  on  the  flows  of  information  in  a 
program  [CMVZ09].  Hayati  and  Abadi  explain  how  to  reduce  purpose  restrictions  to  information  flow 
properties  that  Jif  can  enforce  [HA05].  Their  method  requires  that  inputs  are  labeled  with  the  purposes  for 
which  the  policy  allows  the  program  to  use  them  and  that  each  unit  of  code  be  labeled  with  the  purposes  for 
which  that  code  operates.  If  information  can  flow  from  an  input  statement  labeled  with  one  purpose  to  code 
labeled  for  a  different  purpose,  their  method  produces  a  compile-time  type  error.  (For  simplicity,  we  ignore 
their  use  of  sub-typing  to  model  sub-purposes.)  In  essence,  their  method  enforces  the  rule  if  information 
i  flows  to  code  c,  then  i  and  c  must  be  labeled  with  the  same  purpose.  The  interesting  case  is  when  the 
code  c  uses  the  information  i  to  perform  some  observable  action  aC) j,  such  as  producing  output.  Under  our 
semantics,  we  treat  the  program  as  the  auditee  and  view  the  policy  as  limiting  these  actions.  By  directly 
labeling  code,  their  method  does  not  consider  the  contexts  in  which  these  actions  occur.  Rather  the  action 
(ir  i  is  either  aways  allowed  or  always  disallowed  based  on  the  purpose  labels  of  c  and  i.  By  not  considering 
context,  their  method  is  subject  to  the  same  limitations  as  the  method  of  Byun  et  al.  with  the  subset  A  being 
equal  to  the  set  of  all  actions  ac/t  such  that  c  and  i  have  the  same  label.  However,  using  more  advanced  type 
systems  (e.g.,  typestate  [SY86]),  they  might  be  able  extend  their  method  to  consider  the  context  in  which 
code  is  executed  and  increase  the  method’s  expressiveness. 


Commercial  Products.  FairWarning  and  Cerner’s  P2Sentinel  are  commercially  available  auditing  sys¬ 
tems  designed  for  the  hospital  setting  [Fai,  Cer].  They  log  employee  accesses  to  medical  records  by  col¬ 
lecting  the  accesses  across  the  various  computer  systems  a  typical  hospital  uses.  The  systems  come  with 
a  selection  of  queries  over  the  audit  log  that  recognize  common  suspicious  actions  such  as  an  employee 
looking  up  the  record  of  a  celebrity  or  someone  with  the  same  last  name  (a  possible  relative).  They  also 
allow  the  auditor  create  his  own  queries.  In  principle,  the  auditor  my  craft  any  query  including  ones  that 
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embed  our  auditing  algorithms.  Thus,  these  systems  are,  in  theory,  capable  of  enforcing  purpose  restrictions 
with  our  semantics. 

In  practice,  auditors  using  these  systems  have  employed  much  simpler  queries.  Thus,  we  compare  our 
auditing  algorithms  to  the  ability  of  these  systems  to  enforcement  purpose  restrictions  using  only  the  typical 
queries  employed  by  users  of  them.  For  example,  the  marketing  material  for  FairWarning  highlights  nine 
example  queries  (see  http  :  / /www .  fairwarning .  com/subpages/monitoring .  asp): 

1 .  VIP  record  snooping 

2.  Executive  record  snooping 

3.  Patient  /  employee  record  snooping 

4.  Family  member  and  self-examination  of  records 

5.  Neighbor  record  snooping 

6.  Identity  Theft 

7.  Medical  Identity  Theft 

8.  Simultaneous  logins  from  addresses  or  terminals 

9.  Repeated  login  failures 

Of  these,  the  last  four  are  authentication  issues  orthogonal  purpose  restrictions.  The  first  five  are  examples 
of  purpose  violations.  Employees  should  only  access  records  for  valid  purposes  such  as  treatment.  These 
five  examples  are  cases  in  which  an  employee  accesses  a  record  for  the  illegitimate  purpose  of  curiosity 
(i.e.,  to  snoop).  In  each  of  these  examples,  the  illegitimate  purpose  of  curiosity  is  highly  suggested  by  the 
identities  of  the  record’s  user  (the  employee)  and  the  record’s  subject  (the  patient).  For  example,  in  the  case 
of  VIP  record  snooping,  the  patient  being  a  VIP  makes  every  access  to  his  record  suspicious  since  many 
employees  would  be  curious  about  a  VIP’s  condition. 

Under  the  approach  of  these  systems,  to  enforce  a  prohibitive  rule  restricting  behavior  to  being  not  for 
a  purpose  p,  the  auditor  would  select  a  set  of  queries  that  check  whether  an  auditee’s  behavior  lays  in  the 
set  nbehv(rp)  and  report  a  violation  if  so.  In  the  above  five  examples,  these  queries  take  the  form  of  checks 
on  the  identities  of  the  record’s  user  and  the  record’s  subject.  Such  identity  checks  are  only  capable  of 
recognizing  sets  nbehv(rp)  where  a  behavior  b  is  either  in  or  not  in  nbehv(rp)  based  on  these  identities  and 
not  how  the  record  is  used  or  any  further  context. 

More  generally,  the  auditor  may  craft  more  complex  queries  to  recognize  nbehv(rp).  However,  since 
the  auditor  might  not  think  of  all  behaviors  in  nbehv(rp)  or  all  the  queries  needed  to  detect  such  behaviors, 
the  auditor  might  only  recognize  a  subset  of  nbehv(rp),  making  the  method  incomplete.  Like  our  auditing 
methodology,  this  approach  is  also  unsound  in  that  a  behavior  identified  as  being  in  n  behv(rp)  might  actually 
be  permissible  by  also  being  in  nbehv(rp  )  for  some  allowed  purpose  //. 

To  enforce  an  exclusivity  rule  restricting  behavior  to  being  for  only  the  purpose  p,  the  auditor  would 
select  a  set  of  illegitimate  purposes  that  may  tempt  the  auditee  into  violating  the  restriction.  The  auditor 
would  then  enforce  a  set  of  prohibitive  rules  against  these  illegitimate  purposes.  Whereas  this  approach  looks 
for  suspicious  behaviors,  our  algorithms  verify  that  the  observed  behavior  could  be  for  the  allowed  purpose. 
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Thus,  unlike  an  auditor  using  our  methodology,  an  auditor  using  queries  with  a  system  like  FairWarning  or 
P2Sentinel  must  determine  all  the  tempting  illegitimate  purposes  or  risk  undetected  violations.  Furthermore, 
as  discussed  above,  enforcing  each  possible  prohibitive  rule  is  neither  sound  nor  complete. 

The  approach  of  FairWarning  and  P2Sentinel  offers  both  advantages  and  disadvantages  compared  to  our 
methodology.  The  commercial  approaches  allow  the  auditor  to  select  those  behaviors  most  concerning  to 
him  for  detection.  While  the  auditor  must  have  a  good  understanding  of  the  threats  facing  the  hospital,  he 
does  not  need  to  formalize  this  knowledge  into  an  environment  model.  Since  each  query  is  independent 
of  the  others,  the  auditor  can  expect  reasonable  performance  even  when  understanding  only  a  subset  of  the 
threats.  Furthermore,  the  auditor  may  add  to  or  refine  the  queries  he  uses  as  his  knowledge  of  the  hospital 
and  the  threats  improves.  Our  method,  on  the  other  hand,  requires  the  auditor  to  formalize  the  functioning 
of  the  hospital  as  an  environment  model.  While  the  auditor  can  refine  this  model  as  the  auditor’s  knowledge 
of  the  hospital  improves,  our  algorithms  require  a  fairly  accurate  model  from  the  beginning  to  avoid  false 
positives  and  false  negatives.  (See  Chapter  4  for  an  example  of  model  refinement  and  its  consequences.) 
The  amount  of  work  involved  in  formally  modeling  the  environment  implies  that  our  methodology  is  less 
appropriate  when  auditing  a  small  number  of  accesses.  However,  our  approach  computes  from  this  model 
all  the  behaviors  that  are  not  for  the  purpose  in  question.  Thus,  while  enforcing  an  exclusivity  rule,  our 
approach  does  not  require  the  auditor  to  understand  the  threats  facing  the  hospital.  Given  these  trade-offs, 
the  commercial  approach  does  comparably  well  given  an  environment  that  the  auditor  does  not  understand 
well  in  comparison  to  the  threats  facing  it;  our  approach  does  comparably  well  when  the  auditor  understands 
the  environment  but  the  environment  faces  emerging  threats  that  the  auditor  knows  less  about. 


7.2  Related  Problems  in  Policy  Enforcement 

We  have  already  covered  the  most  closely  related  work  in  Section  7. 1 .  Below  we  discuss  work  on  related 
problems  in  computer  science. 


Minimal  Disclosure.  The  works  most  similar  to  ours  in  approach  have  been  on  minimal  disclosure,  which 
requires  that  the  amount  of  information  used  in  granting  a  request  for  access  should  be  as  little  as  possible 
while  still  achieving  the  purpose  behind  the  request.  Massacci,  Mylopoulos,  and  Zannone  define  minimal 
disclosure  for  Hippocratic  databases  [MMZ06].  Barth,  Mitchell,  Datta,  and  Sundaram  study  minimal  disclo¬ 
sure  in  the  context  of  workflows  [BMDS07].  They  model  a  workflow  as  meeting  a  utility  goal  if  it  satisfies  a 
temporal  logic  formula.  Minimizing  the  amount  of  information  disclosed  is  similar  to  an  agent  maximizing 
his  reward  and  thereby  not  performing  actions  that  have  costs  but  no  benefits.  However,  we  consider  several 
factors  that  these  works  do  not,  including  quantitative  purposes  that  are  satisfied  to  varying  degrees  and 
probabilistic  behavior  resulting  in  actions  being  for  a  purpose  despite  the  purpose  not  being  achieved,  which 
is  necessary  to  capture  the  semantics  of  purpose  restrictions  (Chapter  5). 


Expressing  Privacy  Policies  with  Purpose.  Work  on  understanding  the  components  of  privacy  policies 
has  shown  that  purpose  is  a  common  component  of  privacy  rules  (see,  e.g.,  [BA05,  BA08]). 

Some  languages  for  specifying  access-control  policies  allow  the  purpose  of  an  action  to  partially  deter¬ 
mine  if  access  is  granted.  For  example,  EPAL  is  a  language  in  which  privacy  policies  are  expressed  by  listing 
all  the  conditions  under  which  a  system  should  grant  a  request  for  access  to  sensitive  resources  [PS03] .  These 
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conditions  may  depend  upon  four  factors:  the  identity  of  the  requester  for  access,  the  resource  requested,  the 
action  the  requester  would  like  to  perform  on  the  resource,  and  the  purpose  for  which  the  requester  would 
like  to  perform  the  action.  However,  EPAL  lacks  a  formal  semantics  that  describes  when  an  action  is  for  a 
purpose  and  treats  purposes  as  syntactic  labels.  Rather,  it  depends  on  the  system  making  use  of  the  language 
to  determine  what  actions  are  for  what  purposes  and  provides  no  formal  guidance  as  to  how  the  system 
should  make  this  determination. 

The  Platform  for  Privacy  Preferences  (P3P)  offers  a  language  for  specifying  the  privacy  policies  of 
websites  [Cra02].  These  policies  must  state  the  purposes  for  which  the  website  collects  information.  The 
policy  may  either  reference  one  of  the  predefined  purposes  that  the  language  offers  or  provide  a  custom 
purpose.  The  specification  of  the  language  provides  a  description  of  each  of  the  predefined  purposes  in 
natural  language  [CLVPTJ2].  The  policy  author  must  provide  such  a  description  for  any  custom  purposes  he 
uses.  We  hope  our  work  will  provide  a  method  of  formalizing  when  information  use  meets  the  requirements 
of  these  descriptions. 

SPARCLE  is  a  system  for  authoring  and  examining  privacy  policies  [BKKF05,  BKK06].  The  system 
consumes  policies  written  in  a  restricted  form  of  natural  language  and  parses  them  into  standard  compo¬ 
nents.  The  system  then  allows  the  user  to  examine  the  policy  by  focusing  on  different  components,  edit  the 
policy,  and  translate  the  policy  into  machine  readable  formats  (e.g.,  EPAL).  One  of  the  standard  components 
SPARCLE  considers  is  purpose.  While  SPARCLE  is  capable  of  identifying  restrictions  on  purpose  in  a 
policy,  it  does  not  assign  a  semantics  to  these  restrictions. 

Hanson  et  al.  provide  an  algebra  for  tracking  the  permissible  uses  of  data  as  it  is  transferred  from  system 
to  system  and  is  combined  with  other  information  [HBLK+07].  However,  this  work  is  not  concerned  with 
the  meaning  of  purpose  or  for. 

7.3  Works  from  Philosophy  and  Psychology 

Philosophy  concerns  defining  the  meaning  of  words.  Philosophers  typically  proceed  by  iteratively  refining 
a  definition  to  match  their  intuitions  about  each  new  example  of  the  word’s  use.  The  experimental  methods 
of  psychology  (defined  broadly  to  include  linguistics  and  cognitive  science)  have  given  rise  to  experimental 
philosophy.  This  hybrid  methodology  studies  the  meaning  of  the  words  by  looking  at  the  most  common 
view  of  a  population  rather  than  the  intuitions  of  experts.  Our  work  uses  intuition  until  Chapter  5,  which 
presents  a  survey.  Both  philosophy  and  psychology  apply  their  methodologies  to  understanding  the  nature 
of  human  planning.  We  discuss  these  efforts  below. 

Philosophical  Foundations.  Taylor  provides  a  detailed  explanation  of  the  importance  of  planning  to  the 
meaning  of  purpose,  but  does  not  provide  any  formalism  [Tay66].  Taylor  concludes  that  one  must  distin¬ 
guish  the  purpose  of  actions  from  their  effects:  the  effects  are  the  actual  results  of  the  actions  whereas  the 
purpose  is  merely  the  desired  effects  (page  216).  Our  model  formalizes  this  distinction  by  allowing  an  action 
to  be  for  a  purpose  despite  that  purpose  not  being  achieved. 

Unlike  Taylor,  most  philosophical  works  use  purpose  in  the  sense  of  the  purpose  of  life,  which  differs 
from  the  sense  in  which  privacy  policies  use  the  word.  Many  works  use  desire  and  motivation  to  refer  to  the 
feelings  and  attitudes  that  cause  an  agent  to  act.  That  is,  desires  correspond  to  purposes  in  our  formalism. 
(However,  the  usage  and  connotations  do  differ:  while  a  jaded  physician  may  order  tests  for  the  purpose  of 
treatment,  we  would  not  say  “he  desires  treatment”  and  probably  not  even  “he  desires  to  provide  treatment”, 
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but  “he  desires  to  keep  his  job  and  get  paid”  sounds  reasonable.  Whether  these  differences  have  ramifications 
for  our  formalism  is  future  work.)  These  works  discuss  desires  to  define  intentions.  Intentions  typically 
refers  to  the  modifications  the  agent  hopes  to  make  to  the  state  of  the  world.  That  is,  in  our  formalism, 
intentions  are  actions  the  agent  plans  to  take  under  the  strategy  it  has  selected.  For  example,  the  desire  for 
satiety  motivates  the  intention  to  go  grocery  shopping. 

The  modern  philosophical  work  in  the  area  of  intentions  starts  with  Anscombe  who  argues  that  the 
intention  of  an  action  is  the  answer  offered  to  the  question  Why  did  you  perform  that  action?  [Ans57]. 

Bratman  builds  on  Anscombe’s  work  by  emphasizing  the  importance  of  agent  planning  in  determining 
intentions  to  create  the  Belief-Desire-Intention  (BDI)  model  [Bra87].  In  Bratman’s  work,  an  intention  is 
an  action  an  agent  plans  to  take  where  the  plan  is  formed  while  attempting  to  maximize  the  satisfaction  of 
the  agent’s  desires.  To  some  extent.  Section  2.1  may  be  viewed  as  a  formalization  of  a  simplification  of 
Bratman’s  view.  (The  plans  of  Bratman  are  more  complex  than  our  strategies  to  account  for  the  limited 
reasoning  abilities  of  humans.) 

Using  Bratman’s  work  as  a  starting  point,  Cohen  and  Levesque  present  a  logical  formalization  of  when 
an  agent  intends  to  perform  an  action  or  intends  to  bring  about  a  state  of  affairs  [CL90].  Roughly  speaking, 
under  their  formalism,  an  agent  intends  to  satisfy  a  predict  p  over  states  if  and  only  if  the  agent  has  knowingly 
performing  a  sequence  of  actions  that  makes  p  true  as  a  goal  that  it  believes  it  can  achieve  and  will  continue 
to  attempt  to  make  p  true  until  it  believes  it  is  impossible  to  do  so.  These  predicates  are  related  to  binary 
purpose  scores,  and  our  formalism  produces  strategies  that  roughly  correspond  to  the  intentional  actions 
of  Cohen  and  Levesque.  However,  our  formalism  also  handles  quantitative  purposes  and  information  use. 
Cohen  and  Levesque  comment  on  the  existence  of  quantitative  purposes  and  propose  to  model  them  as  a 
series  of  intentions,  but  do  not  provide  a  formalism  to  do  so. 

Intentions  also  affect  planning  and  will  become  important  as  we  search  for  more  accurate  models  of 
human  planning.  Roy  use  logics  and  game  theory  to  formalize  how  intentions  can  affect  an  agent’s  plan¬ 
ning  [Roy08].  He  uses  his  formalism  to  study  when  an  agent’s  plan  is  rational  given  the  agent’s  intentions. 
Given  the  auditee’s  intentions,  we  could  replace  our  MDP  formalism  of  planning  with  Roy’s  intention-driven 
formalism. 

By  modeling  auditees  as  planning  agents  with  desires  (purposes)  and  beliefs,  the  auditor  has  adopted 
what  Dennett  calls  the  intentional  stance  toward  them  [Den87].  While  our  algorithms  assume  just  the  basic 
intentional  stance  that  the  auditor  may  accurately  model  auditees  as  planning  agents,  our  discussions  of 
issues  such  as  tenable  deniability  suggests  the  stronger  view  that  auditees  actually  plan  and  have  desires 
and  beliefs.  (See  page  34  of  [Den87]  for  a  discussion  of  the  difference.)  Applying  the  intentional  stance  to 
auditees  such  as  computer  programs  or  whole  organizations  is  less  familiar  than  applying  it  to  humans,  but 
fits  Dennett’s  scheme.  However,  the  case  of  computer  programs  is  problematic  when  the  point  of  auditing 
is  to  assign  moral  blame  or  punishment:  computer  programs  are  not  moral  entities  and  will  not  respond  to 
punishment.  In  such  cases,  the  auditor  should  consider  the  programmer,  not  the  program,  as  the  auditee. 
For  example,  in  Section  3.2.2,  we  create  a  POMDP  for  a  website  to  determine  how  it  uses  information  for 
advertising.  In  the  example,  we  audited  “the  website”,  which  could  either  refer  to  the  program  running  the 
website  or  the  programmers  of  the  website.  While  we  can  model  this  program  as  having  a  purpose,  it  is 
unreasonable  to  punish  it.  Thus,  the  auditee  is  better  understood  as  the  programmers  (an  organization)  of 
the  website,  whose  behavior  is  codified  by  the  program.  Under  this  view,  the  program  may  be  identified 
with  the  strategy  of  the  website  programmers. 
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Causality.  Our  treatment  of  for  in  Section  2.1  is  motivated  by  the  counterfactual  definition  of  causality. 
This  definition  requires  that  for  an  action  to  cause  an  effect  that  both  the  effect  actually  occurs  and  that  the 
effect  might  not  have  occurred  if  the  action  did  not  occur.  For  example,  Mackie  defines  a  cause  to  be  insuffi¬ 
cient  and  non-redundant  parts  of  unnecessary  but  sufficient  causes  (INUS  conditions)  for  an  effect  [Mac74]. 
Mackie  models  causes  and  effects  as  facts.  Working  with  sets  of  causes,  this  means  that  a  fact  c  is  a  cause 
of  an  effect  e  if  there  exists  a  set  C  such  that  C  is  sufficient  to  entail  e  (sufficiency)  and  no  subset  of  C  is 
sufficient  to  entail  e  (non-redundancy). 

We  borrow  the  notion  of  non-redundancy  from  Mackie’s  definition  of  causality.  Roughly  speaking,  we 
replace  the  causes  with  actions  and  the  effect  with  a  purpose.  The  extension  to  our  semantics  proposed  in 
Section  6.3,  may  be  seen  as  another  instance  of  non-redundancy.  This  time,  we  replace  the  causes  with 
purposes  and  the  effect  with  an  action.  This  suggests  that  for  an  action  to  be  for  a  purpose,  we  expect 
both  that  the  action  was  non-redundant  for  improving  that  purpose  and  that  the  purpose  was  non-redundant 
in  motivating  the  action.  That  is,  we  expect  planning  to  be  parsimonious.  We  then  use  counterfactual 
reasoning  to  determine  which  purposes  motivate  the  action. 


Experimental  Philosophy.  Experimental  philosophy  has  found  some  inconsistencies  in  how  people  tend 
to  use  the  word  “intent”  called  the  Knobe  effect  [Kno03].  When  it  comes  to  benefits  for  purposes  that  are 
good,  people  tend  to  only  say  that  the  actor  intended  for  the  benefits  if  the  actor  selected  his  action  taking  the 
purpose  into  consideration,  which  agrees  with  our  model.  However,  when  it  comes  to  bad  purposes,  people 
tend  to  say  that  the  actor  intended  for  the  (bad)  benefits  even  if  the  actor  did  not  select  his  action  with  the 
goal  of  achieving  the  bad  purpose  in  mind,  which  disagrees  with  our  model.  (See  [Fel08]  for  a  survey.) 


Human  Planning.  Psychological  studies  have  produced  models  of  human  thought  (see,  e.g.,  [ABB+04]). 
However,  these  are  too  low-level  and  incomplete  for  our  needs  [DKP96].  The  GOMS  (Goals,  Operators, 
Methods,  and  Selection  rules)  formalism  provides  a  higher  level  model,  but  is  limited  to  selecting  behavior 
using  simple  planning  approaches  [CMN83,  JK96].  Simon’s  approach  of  bounded  rationality  [Sim55]  and 
related  heuristic-based  approaches  [GS02]  model  more  complex  planning,  but  with  less  precise  predictions. 


7.4  Related  Algorithms 

Plan  Recognition.  Attempting  to  infer  the  plan  that  an  agent  has  while  performing  an  action  is  plan 
recognition  [SSG78].  Plan  recognition  may  predict  the  future  actions  of  agents  allowing  systems  to  antic¬ 
ipate  them.  Often,  plan  recognition  algorithms  model  how  “low-level”  actions  contribute  to  achieving  a 
“top-level”  action  that  is  done  for  its  own  sake  (see,  e.g.,  [KA86]).  These  top-level  actions  are  similar  to 
purposes.  However,  our  auditing  algorithm  checks  whether  a  sequence  of  actions  is  consistent  with  a  given 
purpose  rather  than  attempting  to  predict  the  most  likely  purpose  or  plan  motivating  the  actions. 

The  work  billed  as  plan  recognition  most  closely  related  to  our  work  is  by  Ramirez  and  Geffner  [RG09, 
RG10,  RG11].  They  approach  the  problem  by  first  attempting  to  determine  what  goal  state  the  agent  is 
attempting  to  reach  rather  than  attempting  to  recognize  the  agent’s  plan  directly.  Thus,  this  has  more  in 
common  with  work  on  goal  inference  than  standard  approaches  to  plan  recognition  and  we  discuss  it  with 
the  other  works  on  goal  inference. 
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Most  work  on  plan  recognition  assumes  that  the  agent  is  not  attempting  to  mislead  the  plan  recognizer 
since  they  are  designed  to  aid  cooperation  with  the  agent.  Our  work  is  related  to  work  on  adversarial  plan 
detection  [AFFH86]. 

Particularly  related  is  the  work  of  Geib  and  Goldman,  who  use  adversarial  plan  recognition  to  aid  in¬ 
trusion  detection  [GG01].  Similar  to  standard  works,  they  model  plans  as  a  graph  that  represents  a  space 
of  possible  plans.  Nodes  of  the  graph  represent  actions  and  directed  edges  represent  the  order  in  which 
the  adversary  must  perform  the  actions.  Intrusions  are  paths  in  the  graph  from  an  initial  node  to  a  goal 
node.  However,  unlike  most  work  on  plan  recognition,  owing  to  the  hostile  nature  of  the  actor,  they  do  not 
assume  that  all  relevant  actions  are  observable.  Thus,  rather  than  simply  comparing  the  observed  actions  to 
paths  in  the  graph  to  determine  possible  plans,  their  recognition  algorithm  also  considers  unobserved  actions 
consistent  with  the  state  of  the  system  that  the  adversary  might  have  performed. 

Relatedly,  Cuppens,  Autrel,  Miege,  and  Benferhat  attempt  to  recognize  malicious  intentions  for  intrusion 
detection  [CAMB02].  They  model  attacks  as  consisting  of  multiple  actions  each  with  pre-conditions  and 
post-conditions.  An  adversary  attempts  to  perform  a  malicious  action  by  first  performing  all  the  suspicious 
actions  needed  to  enable  the  pre-condition  of  the  malicious  action.  Their  approach  is  to  observe  these 
suspicious  actions  and  predict  from  their  model  what  other  actions  the  adversary  might  have  performed 
or  will  be  performing.  In  particular,  they  try  to  predict  which  (if  any)  malicious  action  the  adversary  is 
attempting  to  perform  using  a  shortest  path  heuristic.  The  distinction  between  suspicious  and  malicious 
actions  does  not  apply  to  our  work  since  we  consider  purposes,  not  actions,  to  be  malicious.  Indeed,  in  our 
setting  many  actions,  such  as  looking  up  a  medical  record,  could  be  either  acceptable  or  malicious  depending 
upon  the  context. 

The  models  of  planning  used  in  both  of  these  works  differ  from  ours  in  two  ways.  First,  we  model 
purposes  quantitatively  instead  of  qualitatively.  Second,  our  work  considers  probabilistic  effects  of  the 
environment  that  might  cause  the  agent  to  fail  to  achieve  its  plan. 


Goal  Inference.  Works  on  goal  inference  attempt  to  determine  from  observations  of  an  agent’s  actions 
what  goal  (typically,  a  goal  state)  the  agent  is  attempting  to  reach.  Identifying  purposes  with  goals,  our  work 
uses  goal  inference  to  enforce  purpose  restrictions. 

The  goal  inference  works  most  closely  related  to  ours  are  those  using  planning  models  similar  to  ours 
to  compare  the  agent’s  actions  to  the  actions  that  the  agent  would  plan  to  perform  given  various  possible 
goals.  Early  work  by  Rao,  Shon,  and  Meltzoff  motivate  such  reasoning  by  comparing  such  a  process  using 
a  Bayesian  graphical  model  to  models  of  infant  imitation  [RSM04],  Shon,  Grimes,  Baker,  and  Rao  consider 
an  algorithm  in  such  a  setting  [SGBR04].  While  the  Bayesian  graphical  models  used  in  these  works  can 
represent  partial  observations,  like  POMDPs,  they  presuppose  a  fixed  upper-bound  on  the  number  of  actions 
the  agent  can  perform. 

Verma  and  Rao  consider  alternative  algorithms  and  extend  the  model  to  include  a  distinguished  action 
StayPut  that  is  similar  to  our  distinguished  action  stop  [VR05,  VR06].  Unlike  our  irrevocable  action  stop, 
an  agent  may  perform  StayPut  before  other  actions.  However,  they  presume  a  prior  probability  distribution 
(i.e.,  prior  to  observing  the  actions  the  agent  takes)  over  actions  that  makes  StayPut  very  unlikely  before 
the  agent  reaches  the  goal  state  and  equally  likely  to  other  actions  afterwards.  While  this  presumption 
is  sufficient  for  their  goal  of  ensuring  that  the  agent  takes  the  shortest  path  to  the  goal  state  (with  high 
probability),  it  does  not  satisfy  our  stronger  goal  that  the  agent  also  only  performs  some  no-op  action  once 
the  goal  is  reached  (or,  under  our  model,  once  no  further  reward  is  possible). 
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The  work  of  Baker,  Saxe,  and  Tenenbaum  focuses  on  the  planning  aspect  of  goal  inference  and  attempt 
to  ensure  that  they  use  a  model  of  planning  simple  enough  to  correspond  to  human  planning  [BTS06].  They 
extend  their  model  to  an  MDP  model  similar  to  ours  [BTS07,  BST09].  Under  these  MDP  models,  rather  than 
having  a  reward  function,  the  agent  attempts  to  reduce  the  costs  of  reaching  a  goal  state.  For  each  possible 
goal  state,  their  algorithms  use  the  degree  to  which  the  agent’s  actions  minimizes  the  costs  of  reaching  the 
goal  state  to  assign  a  probability  to  that  goal  state  being  the  one  pursued  by  the  agent.  Our  reward  functions 
are  similar  to  the  negation  of  their  cost  functions,  but  these  works  predict  which  goal  state  the  agent  is 
pursuing  rather  than  which  cost  function  it  is  using.  Furthermore,  rather  than  determine  whether  an  action 
is  optimal  for  a  purpose,  they  assign  a  probability  to  the  action  that  is  proportional  to  how  close  to  optimal  it 
is.  To  validate  their  models,  they  performed  experiments  showing  participants  animated  characters  moving 
around  a  scene  with  various  possible  goal  destinations.  They  compared  the  goals  the  participants  assigned 
the  animated  characters  to  the  most  probable  goal  produced  by  their  models. 

Baker,  Saxe,  and  Tenenbaum  generalize  their  methods  for  MDPs  to  POMDPs  and  perform  similar  exper¬ 
iments  to  validate  the  generalized  model  [BST11].  After  performing  goal  inference  using  the  Strips  model 
of  planning  [RG09,  RG10],  Ramirez  and  Geffner  also  adopts  a  POMDP  model  [RG11].  For  the  model  of 
Ramirez  and  Geffner,  the  POMDP  model  uses  costs,  goal  states,  and  a  relaxed  sense  of  optimality  similar 
to  MDP  models  of  Baker  et  al.  The  POMDP  model  of  Baker  et  al.  is  more  closely  related  to  our  models. 
This  model  uses  a  reward  function  on  states  and  actions  like  ours.  Like  their  previous  works,  they  do  not 
require  that  an  action  be  optimal  to  be  for  a  purpose  and  assign  a  probability  to  an  action  based  on  how 
close  to  optimal  it  is.  However,  rather  setting  the  probability  of  an  action  to  be  proportional  to  the  quality 
of  the  action  (as  computed  by  Q*),  they  set  it  to  an  approximation  of  the  quality  (Hauskrecht’s  look-ahead 
approximation  QLH  [HauOO]).  Our  POMDP  algorithm  for  auditing  is  similar  to  their  algorithm.  However, 
to  maintain  soundness,  our  algorithm  accounts  for  the  error  of  approximate  POMDP  solving.  Furthermore, 
their  algorithms  may  assign  a  non-zero  probability  to  a  goal  (or  purpose)  even  if  the  agent’s  actions  are 
inconsistent  with  pursuing  that  goal  under  our  strict  definition.  Lastly,  they  do  not  consider  non-redundancy 
nor  information  use. 

Baker  et  al.  formalizes  a  simple  form  of  sequential  planning  for  an  MDP  model  with  multiple  goals 
that  is  similar  to  the  sequential  planning  discussed  in  Section  6.2  [BTS07,  BST09].  However,  they  do  not 
support  intentions  from  previous  goals  affecting  future  goals.  Blokpoel,  Kwisthout,  van  der  Weide,  and 
Rooij  examine  extending  this  model  to  handle  simultaneous  purposes,  which  we  discuss  in  Section  6.3,  but 
do  not  consider  intentions  [BKvdWvRIO].  Ullman,  Baker,  Macindoe,  Evans,  Goodman,  and  Tenenbaum 
extend  this  line  of  work  for  handling  social  interactions  (see,  e.g.,  [  UBM1  09 1). 

Also  related  is  the  work  of  Mao  and  Gratch  [MG04].  While  it  differs  from  our  work  in  the  same  ways 
as  the  work  of  Baker  et  al.,  it  also  differs  in  that  rewards  track  how  much  the  agent  wants  to  achieve  the  goal 
rather  than  the  degree  of  satisfaction  of  the  goal. 


Automated  Planning.  Decision-theoretic  planning  is  planning  to  optimize  some  criteria,  such  as  a  pur¬ 
pose.  (Blythe  provides  a  survey  [Bly99].)  Optimizing  MDPs  or  POMDPs  to  create  plans  are  just  two 
instances  of  decision-theoretic  planning.  Other  instances  may  be  more  accurate,  convenient,  or  general 
models  of  human  planning. 

For  example,  due  to  uncertainty  the  auditor  may  have  about  the  model  used  by  the  auditee,  we  are 
interested  in  environment  models  that  are  like  MDPs  but  without  fixed  probabilities  assigned  to  transitions. 
Discrete-time  Markov  chains  without  fixed  probabilities  are  known  as  interval-valued,  discrete-time  Markov 
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chains  (IDTMCs).  The  form  of  IDTMC  most  similar  to  our  model  is  the  Uncertain  Markov  Chain  (UMC) 
model  [JL91].  We  hope  the  algorithm  of  Sen  et  al.  [SVA06]  for  a  model  checking  problem  related  to  UMCs 
may  shed  light  on  how  to  generalize  our  algorithm  found  in  Section  2.3. 

The  POMDP  model  presented  in  Chapter  3  provides  a  model  for  planning  as  the  auditee  over  time  gains 
a  better  understanding  of  the  environment.  While  the  POMDP  model  allows  the  auditee’s  beliefs  about  its 
current  state  to  change  over  time,  the  model  itself  is  static.  Bethke,  Bertuccelli,  and  How  present  an  adaptive 
MDP  model  that  changes  over  time  based  on  the  auditee’s  current  understanding  of  the  model  [BBH08].  We 
do  not  consider  such  adaptive  models. 
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Chapter  8 


Conclusions  and  Future  Work 


8.1  Conclusion 

We  use  planning  to  create  the  first  formal  semantics  for  determining  when  a  sequence  of  actions  or  informa¬ 
tion  use  is  allowed  under  a  purpose  restriction.  In  particular-,  our  formalism  uses  models  similar  to  MDPs 
and  POMDPs  for  planning,  which  allows  us  to  automate  auditing  for  both  exclusivity  and  prohibitive  pur¬ 
pose  restrictions  (Chapters  2  and  3).  We  have  provided  an  auditing  algorithms  and  an  implementation  base 
on  our  formalism  (Sections  2.3,  2.4,  and  3.5). 

We  validate  that  our  approach  based  on  planning  accurately  captures  the  meaning  of  purpose  restrictions 
with  numerous  intuitive  examples  (Sections  2.1.3,  2.2.2,  2.2.3,  3.3.3,  and  3.3.2  and  Chapter  4)  and  an 
empirical  study  of  how  people  understand  the  word  “purpose”  in  the  context  of  privacy  policy  enforcement 
(Chapter  5). 

We  apply  our  formalism  to  understand  the  ramifications  of  Regional  Health  Information  Organizations 
(Chapter  4).  Furthermore,  we  use  our  formalism  to  explain  and  compare  previous  methods  of  policy  en¬ 
forcement  in  terms  of  a  formal  semantics  (Section  7.1).  Our  formalism  highlights  that  an  action  can  be 
for  a  purpose  even  if  that  purpose  is  never  achieved,  a  point  present  in  philosophical  works  on  the  subject 
(e.g.,  [Tay66]),  but  whose  ramifications  on  policy  enforcement  had  been  unexplored. 

These  contributions  lead  us  to  conclude  our  thesis: 

A  model  of  planning  underlies  a  formalization  of  purpose  restrictions  that  enables  their  auto¬ 
mated  enforcement. 

However,  we  recognize  the  limitations  of  our  formalism:  it  imperfectly  models  human  planning  and  only 
captures  some  forms  of  planning  for  multiple  purposes  (Chapter  6).  Nevertheless,  we  believe  the  essence 
of  our  work  is  correct:  an  action  is  for  a  purpose  if  the  actor  selects  to  perform  that  action  while  planning 
for  the  purpose.  Future  work  will  instantiate  our  semantic  framework  with  more  complete  models  of  human 
planning. 

8.2  Future  Work 

Future  work  may  improvement  the  accuracy,  practicality,  or  generality  of  our  formalism.  Unfortunately, 
these  three  directions  may  not  lead  to  the  same  place:  improving  the  accuracy  or  generality  of  the  formalism 
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may  make  it  harder  to  use  in  practice.  Future  work  may  also  apply  our  formalism.  We  consider  these  four 
areas  in  turn. 


8.2.1  Improving  Accuracy 

While  Chapter  5  presented  a  survey  showing  that  a  relation  exists  between  planning  and  purpose  restrictions, 
more  studies  will  clarify  this  relation.  We  would  like  to  study  whether  exclusivity  rules  require  the  agent  to 
actually  follow  a  plan  for  the  allowed  purpose  (Hypothesis  H2)  or  whether  the  agent  just  needs  to  perform 
actions  consistent  with  such  a  plan  (Hypothesis  H3).  We  would  also  like  know  whether  people  consider 
an  incompetent  agent  in  violation  of  a  purpose  restriction  when  the  agent  thinks  it  is  performing  actions 
consistent  with  a  plan  for  a  purpose  but  is  mistaken.  As  discussed  in  Section  6.4,  a  better  model  of  human 
planning  may  improve  the  accuracy  of  our  predictions  and  allow  us  to  generalize  our  formalism  to  multiple 
purposes. 


8.2.2  Furthering  Practicality 

Under  our  formulation  of  purpose  restrictions,  to  check  whether  a  system  obeys  a  policy,  an  auditor  must  not 
only  model  the  system  and  policy,  but  also  how  the  system  could  have  behaved  with  an  environment  model. 
Auditors  can  often  automatically  extract  system  models  from  source  code  (see,  e.g.,  [CGPOO])  or  create 
system  models  from  descriptions  of  operating  procedures.  However,  in  many  cases,  the  auditor  will  have  to 
create  the  environment  model  by  hand  after  researching  all  the  possible  behaviors.  Given  the  difficulty  of 
this  task,  we  desire  methods  of  finding  policy  violations  that  do  not  require  a  full  environment  model  even 
if  they  are  only  approximately  correct. 

Creating  environment  models  can  be  difficult  even  if  the  auditor  has  an  intuitive  idea  of  its  form.  Thus, 
a  tool  that  describes  the  possible  environment  models  consistent  with  a  policy  and  a  specification  of  the 
system  would  be  useful.  For  an  auditor  to  find  such  a  tool  useful,  he  must  be  able  to  understand  the  tool’s 
output.  An  interactive  query  engine  might  aid  the  auditor  in  understanding  the  results. 

Often  systems,  such  as  those  storing  electronic  medical  records,  have  no  model  but  produce  logs  de¬ 
tailing  their  use.  For  such  systems,  the  auditor  may  be  able  to  learn  the  environment  model  from  the  logs. 
Previous  work  on  reinforcement  learning,  such  as  Q-leaming  [Wat89],  optimizes  MDPs  using  observations 
of  their  behavior,  often  available  from  logs,  instead  of  the  MDP  model  itself.  Similarly,  SARSA  can  allow 
POMDP  optimization  without  the  model  itself  [RN94].  Bauer  constructs  libraries  of  possible  plans  from 
observed  sequences  of  actions  [Bau98].  Process  mining  attempts  to  create  a  model  of  the  processes  used  at 
an  organization  from  logs  [WvdAOl]. 

We  hope  these  techniques  may  aid  future  research  on  auditing  without  a  pre-created  environment  model. 
We  envision  a  system  using  Experience-Based  Access  Management  (EBAM),  which  attempts  to  iteratively 
construct  a  model  for  access  control  as  a  system  is  operating  [GLM1 1].  Over  time,  EBAM  refines  a  model 
based  upon  an  auditor  identifying  results  that  are  false  positives  or  false  negatives.  We  see  our  work  fitting 
into  this  framework  by  providing  a  formalism  for  the  model  with  a  semantics  that  allows  the  system  to 
generalize  from  its  experiences.  This  approach  would  be  similar  to  how  Zhang  et  al.  use  Role-Based  Access 
Control  as  a  formalism  for  EBAM  models  [ZGL+1 1], 


100 


8.2.3  Generalizations 


Interactions  of  Multiple  Purposes  and  Agents.  Most  pressingly,  we  would  like  to  address  the  interac¬ 
tions  among  multiple  purposes  discussed  in  Chapter  6. 

We  would  also  like  to  study  how  the  definition  extends  to  interactions  among  multiple  agents.  By 
employing  MDPs,  we  implicitly  limited  the  agent’s  ability  to  reason  about  how  other  agents  may  be¬ 
have.  To  allow  strategic  reasoning,  we  would  have  to  adopt  game  theoretic  models  in  the  place  of  MDPs 
(see,  e.g.,  [OR94]).  Work  on  goal  inference  during  social  interactions  could  also  prove  relevant  (see, 
e.g.,  [UBM+09]). 

Interaction  with  Obligations.  Many  privacy  policies  contain  obligations :  requirements  imposed  upon 
the  system  as  the  result  of  performing  some  action  (see,  e.g.,  [DFK07]).  For  example,  a  company  may  be 
required  to  delete  a  credit  card  number  six  months  after  completing  a  transaction.  Designing  a  system  to  obey 
obligations  is  complicated  since  the  system  must  discharge  some  obligations,  such  as  the  one  above,  after 
the  use  that  triggered  them.  Thus,  systems  may  benefit  from  automated  methods  for  enforcing  obligations. 

Obligations  may  have  a  non-trivial  interaction  with  purposes.  For  example,  consider  a  policy  that  re¬ 
quires  that  funds  only  be  used  to  purchase  books.  Suppose  that  an  employee  purchases  a  book  with  a  credit 
card  and  later  uses  funds  to  pay  the  credit  card  bill.  Many  would  argue  that  the  employee  obeyed  the  policy 
despite  not  strictly  using  the  funds  to  purchase  the  book  since  paying  the  bill  was  an  obligation  incurred  for 
the  purpose  of  purchasing  the  book.  We  would  like  our  semantics  to  explain  such  circumstances. 

Context-Dependent  Purposes.  In  some  cases  a  purpose  restriction  might  only  indirectly  refer  to  a  pur¬ 
pose.  For  example,  the  policy  of  the  website  of  the  U.S.  Social  Security  Administration  states  [U.SlOb]: 

By  providing  your  personal  information,  you  give  us  consent  to  use  the  information  only  for  the 

purpose  for  which  it  was  collected.  We  describe  those  purposes  when  we  collect  information. 

Enforcing  such  policies  may  pose  difficulties  for  the  auditor  since  the  policy  refers  to  a  purpose  not  by  name 
but  rather  by  the  context  in  which  the  website  collected  the  information.  While  the  auditor  may  use  our 
semantics  and  algorithms  after  determining  the  appropriate  purposes  from  the  context  in  which  the  website 
collected  the  information,  we  desire  methods  to  make  this  task  easier. 

More  burdensome  still,  in  some  cases,  the  auditor  might  not  have  the  domain  expertise  to  specify  the 
meaning  of  a  purpose  even  after  determining  it.  For  example,  the  Facebook  policy  governing  Facebook 
applications  states,  “You  will  only  request  the  data  you  need  to  operate  your  application.”  and  “A  user’s 
friends’  data  can  only  be  used  in  the  context  of  the  user’s  experience  on  your  application.”  [Facl2].  While 
these  are  not  explicitly  purpose  restrictions,  the  auditor  may  view  them  as  implying  a  purpose  restriction: 
An  application  will  collect  and  use  Facebook  user  data  only  for  the  purpose  that  the  application  serves. 
If  Facebook  were  to  attempt  to  enforce  this  purpose  restriction,  it  would  need  to  determine  the  purpose  of 
the  application,  which  might  be  unclear  to  the  auditor  working  for  Facebook.  Facebook  could  require  the 
application  developer  to  provide  the  purpose  and  even  the  environment  model  to  the  auditor.  In  this  case, 
the  auditor  may  use  our  methodology  to  audit  the  application  under  the  provided  model.  However,  the 
auditor  might  still  not  have  enough  information  to  judge  the  accuracy  or  appropriateness  of  the  developer’s 
representations.  We  consider  developing  methods  of  enforcing  purpose  restrictions  that  reference  context- 
dependent  purposes,  such  as  this  one,  an  interesting  direction  for  future  work. 


101 


8.2.4  Applications 


We  would  like  to  conduct  a  case  study  to  determine  the  applicability  of  our  semantics  to  a  real  system. 
Possible  systems  to  study  is  the  admissions  process  for  graduate  school  or  the  privacy  practices  of  a  hospital. 
This  exercise  will  aid  us  in  understanding  the  process  of  creating  complex  environment  models. 

Our  approach  to  auditing  considered  the  policy  and  environment  to  be  a  fixed  entities  passively  modeled. 
However,  our  formalism  can  shed  light  on  how  organizations  may  design  their  systems  to  comply  with 
policies  or  create  enforceable  policies.  Since  the  agent  will  attempt  to  maximize  his  personal  benefits, 
a  system  designer  must  assure  that  these  align  with  the  purposes  that  the  system  would  like  to  further. 
Mechanisms  such  as  punishments,  which  decrease  personal  benefits  for  disallowed  actions,  can  help  align 
these  benefits.  Mechanisms  forcing  an  auditee  to  declare  its  strategy  can  make  violations  easier  to  detect. 

The  idea  of  minimum  necessary  shows  up  in  HIPAA  [U.SlOa]: 

The  Privacy  Rule  generally  requires  covered  entities  to  take  reasonable  steps  to  limit  the  use 
or  disclosure  of,  and  requests  for,  protected  health  information  to  the  minimum  necessary  to 
accomplish  the  intended  purpose. 

We  would  like  to  apply  our  formalism  to  minimum  necessary  use,  disclosure,  and  requests.  We  could  than 
compare  an  approach  based  on  our  formalism  of  purpose  to  previous  work  on  minimum  disclosure  [MMZ06, 
BMDS07].  In  particular,  we  like  to  study  how  to  represent  one  system  using  more  information  than  another 
with  the  equivalence  relations  =  on  information  used  in  Chapter  3.  We  are  also  interested  in  formalizing 
the  idea  of  “accomplishing”  a  qualitative  purpose  that  can  always  be  further  satisfied,  such  as  marketing  or 
profit. 

8.3  Perspective 

Fundamentally,  our  work  shows  the  difficulties  of  enforcing  purpose  restrictions  due  to  issues  such  as  the 
tenable  deniability  of  ulterior  motives  (Sections  2.2.2  and  2.2.3).  These  difficulties  justify  policies  proscrib¬ 
ing  conflicts  of  interest  and  requiring  the  separation  of  duties  despite  possibly  causing  inefficiencies.  For 
example,  many  hospitals  would  err  on  the  side  of  caution  and  disallow  a  referral  from  a  physician  to  his  own 
private  practice  or  require  a  second  opinion  to  do  so,  thereby  restraining  the  ulterior  motive  of  profit.  Indeed, 
despite  the  intuition  that  privacy  is  security  with  a  purpose,  due  to  these  difficulties,  purpose  possibly  plays 
the  role  of  guidance  in  crafting  more  operational  internal  policies  that  organizations  enforce  rather  than  the 
role  of  a  direct  input  to  the  formal  auditing  process  itself.  In  light  of  this  possibility,  one  may  view  our  work 
as  a  way  to  judge  the  quality  of  these  operational  policies  relevant  to  the  intent  of  the  purpose  restrictions 
found  in  the  actual  privacy  policy. 

However,  we  do  view  purpose  restrictions  as  important  to  privacy  policies.  We  believe  that  privacy 
policies  are  best  thought  of  as  a  balancing  act  between  the  security  of  an  information  subject  and  the  utility 
of  an  information  holder.  While  abuse  of  the  information  in  question  harms  the  information  subject,  the 
information  holder  controls  use  of  the  information.  Traditional  data  security  typically  envisions  an  entity 
that  holds  information  about  itself.  As  the  information  holder  is  one  and  the  same  as  the  information  subject, 
the  entity  can  internally  balance  its  competing  concerns  of  wanting  to  make  use  of  the  information  while 
also  protecting  it. 

In  the  case  of  privacy,  on  the  other  hand,  the  information  holder  is  not  the  information  subject.  For 
example,  while  a  hospital  stores  a  medical  record  and  is  in  charge  of  protecting  it,  it  is  the  patient  that  bears 
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the  direct  harm  if  that  record  becomes  public.  Indeed,  the  hospital  might  even  stand  to  profit  from  sharing 
the  record  with  entities  (such  as  insurance  companies)  that  the  patient  my  deem  adversarial. 

Thus,  we  view  privacy  policies  as  a  balancing  act  between  the  information  subject  and  the  information 
holder.  The  role  of  privacy  policies  and  laws  requiring  privacy  precautions  (such  as  HIPAA)  is  to  strike  a 
balance  between  these  competing  concerns.  Purpose  plays  a  role  by  restricting  the  actions  for  which  the 
information  holder  may  use  the  information  to  only  those  where  the  benefit  has  been  deemed  by  the  policy 
maker  as  commensurate  with  the  risks  to  the  information  subject. 

That  purpose  restrictions  do  not  actually  reference  this  balancing  act  limits  their  ability  to  provide  pri¬ 
vacy.  Indeed,  under  our  formalism,  an  action  could  be  for  a  purpose  even  if  that  action  is  part  of  a  plan 
that  barely  improves  the  purpose  while  having  grave  privacy  implications.  In  light  of  this,  we  believe  that 
policies  should  consider  the  actual  balancing  of  competing  concerns.  Unfortunately,  whereas  we  can  easily 
express  purpose  restrictions,  the  trade-offs  between  two  different  goals,  such  as  treatment  and  patient  pri¬ 
vacy,  are  often  difficult  to  compare.  For  this  reason,  we  expect  purpose  restrictions  to  continue  to  a  play  a 
key  role  in  privacy  policies. 
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Appendix  A 


Further  Background  on  POMDPs 

A.l  Details  of  the  Belief  MDPs 

First,  we  reduce  r(/3,  a) i  ff  )  for  bmdp(m)  to  an  expression  in  terms  of  the  features  of  the  POMDP  m.  Doing 
so  involves  using  Kronecker’s  delta:  d(x.  y)  is  equal  to  1  if  x  =  y  and  0  otherwise.  The  reduction  also  uses 
the  set  a)  of  observations  that  are  possible  under  the  belief  state  (3  after  performing  the  action  a: 

Q'miP,  a)  =  {  s  €  O  |  Pr[0=o  |  B=/3,  A=a ]  /  0  }.  The  reduction  is  as  follows: 

(A.l)  t(P,  a)(P')  =  Pr [B'=tf  \  B=f3,  A=a } 

(A.2)  =  Y  Pr [B'=P'  A  0=o  \  B=f3,  A=a } 

oeO 

(A.3)  =  Y  Pr[S'=/3'  A  0=0  \  B=(3,  A=a] 

(A.4)  =  Y  pr[£'=/3'  I  B=0,  A=a ,  0=o)  Pr[0=o  |  B=P,  A=a] 

OG©m(/3.“) 

(A.5)  =  Yj  d(P'i  updatern(/3,  a,  o))  Pr[0=o  |  B=P,  A=a] 

(A.6)  =  Y^  3(P'i  uPdatem(/3,  a,  o))  Pr[0=o  |  B=fi:  A=a] 

o£<dm(f3,a,[3') 

(A.7)  =  Y  pr  [0=o  |  B=P,  A=a } 

(A.8)  =  Y  Nm(P,a){o) 

o€0m(/3,a,/3') 

Line  A.2  follows  from  the  Law  of  Total  Probability  since  the  agent  can  make  only  a  single  observation 
after  each  action  making  the  different  possible  observations  mutually  exclusive  events.  Line  A.3  follows 
since  Pr \B'=P'  A  0=o  \  B=P,  A=a]  is  zero  for  any  o  not  in  0,)T)(/3,  a).  Line  A.4  uses  the  Multiplication 
Rule,  which  is  well  defined  in  this  case  since  for  all  o  in  Pr  [B=fi,  A=a,  0=o\  is  non-zero. 

Using  that  update  is  a  deterministic  function,  we  know  that  for  each  value  of  p,  a,  and  o,  the  probability 
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Pr [B'=j3'  |  B=/3 ,  A=a,  0=o]  is  non-zero  for  only  one  value  of  p'  and  that  value  is  updatem(/3,  a,  6).  Thus, 
Pr  [B'=f3'  |  B=(3,  A=a,  0=o]  =  S(fj> .  updatem(/3,  a,  o))  and  Line  A. 5  is  justified.  Line  A.6  follows  since 
5(P',  updatem(/3,  a,  6))  is  zero  for  all  o  that  is  in  Q'm((3,a)  but  not  in  0m(/3,  a,  f3')  and  0m(/3,  a,  /?')  C 
a).  Line  A.7  follows  since  5(/3' ,  updatem(/3,  a,  o))  is  1  for  all  o  in  0m(/3,  a,  /S'). 

The  state  space  of  bmdp(m)  is  uncountably  infinite.  Thus,  some  equations  used  for  MDPs  are  not  well 
defined  for  the  belief  MDP  bmdp(m).  For  example,  the  equation 


't’bmdp (m)0r/3)  =  Rm(P,a(j3))  +7  *vbmdp (m)(<T/3') 

/3' GB 


involves  a  summation  over  an  uncountably  infinite  number  of  belief  states,  which  is  not  well  defined. 

Fortunately,  we  can  adjust  such  equations  to  restrict  summations  to  index  over  a  countable  subset  of  the 
belief  state  space  B.  This  restriction  is  possible  because  for  each  current  belief  state,  action,  and  observation, 
only  a  single  next  belief  state  is  possible.  Thus,  for  each  current  belief  state  and  action,  only  a  finite  number 
of  next  belief  states  are  possible,  one  for  each  observation.  Let  Tm(/3,  a)  denote  these  possible  next  states: 

a)  =  {  (3'  €  B  |  3o  S  O  s.t.  update(/3,  a,  6)  =  /3 '  }.  The  subset  rm(/3,  a)  is  finite  since  O  is  finite. 
j3'  is  in  Tm(J3,  a)  if  and  only  if  0m(/3,  a,  f3')  is  not  empty.  Thus,  if  r(/3,  a)(/3')  >  0,  then  /3'  G  rm(/3,  a) 
since  r(/3,  a) (/?')  =  Soeem(/3  a  (3')  ^ m(/3,a)(o ).  Thus,  we  may  replace  B  in  the  above  summation  with 
rm(/3,o)  to  get 


^bmdp(m)  P')  T  T  ^  ^  t(/3,  Cr(/?))(/3  )  *  (o">  ft  ) 

P'eTm(P,<r(J3)) 


Similar  adjustments  rescue  other  definitions  that  involve  summations  over  B.  For  example,  <?bmdp(m)  (A  a ) 
becomes  a)  +  7  E/3'erm(/3,a)  T(P,a)(P')  *  <mdp(m)(/3')- 

After  making  these  adjustments,  we  can  prove  that  the  optimal  value  that  a  belief  MDP  bmd  p(m)  assigns 
to  a  belief  state  is  equal  to  the  optimal  value  that  the  POMDP  m  assigns  to  it  to  prove  Proposition  3  in  the 
next  section. 
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A.2  Proof  of  Proposition  3 


For  all  (3  and  a, 


(A.9) 

YNm((3,a(f3))(o) 

o&O 

*  Vm(a,  updat em(/3,  a({3),  0 )) 

(A.  10) 

E 

^<5(/3',updatem(/3,o-(/3),o))  *  Nm(J3,  cr(/3))(o)  *Vm(a,f3') 

p'eTm(p,a(p)) 

oeo 

(A.ll) 

E 

Y  $((3',  updatem((3,  cr((3),  0))  *  Nm(/3,a(f3))(o)  *Vm(a,  (3' 

P'erm(p,a(p)) 

oeem(P,<r(P),P ') 

(A.  12) 

E 

Y  Nm{(3,  <7{f3))(o)  *  Vm (cr,  f3') 

p'erm(p,a(p)) 

o£&m{P,<r{P),P') 

/  \ 

(A.  13) 

=  E 

(  Y  N™(f3,cr((3))(o)\  *Vm{cr,(3') 

p'erm(p,a(p)) 

\oeem(p,<T(P),P')  ) 

(A.  14) 

E 

P'erm{p,a{p)) 

T(P,a((3))U3')*Vm{o,(3') 

Line  A.  10  is  justified  since  updatem(/3,  a  ((3),  o)  is  inTm(/3,  cr(/3))  by  definition  and<5(/3',  updatem(/3,  a((3),o)) 
is  equal  to  1  at  exactly  one  value  of  (3',  when  (3'  =  update, n((3,  cr(3).  o ),  and  is  0  for  all  other  values  of  (3r . 
Line  A.l  1  follows  from  the  fact  that  for  all  (3,  (3' ,  a,  and  o,  5{j3' ,  update,,,  (T,  a ((3),  6))  is  equal  to  0  if  o  is 
not  in  Qm(/3,a(/3),  (3')  =  {  o  €  O  |  updatem(/3,  a(/3),  o)  =  (3'}.  Line  A. 12  is  justified  since  for  all  (3,  (3', 
a,  and  o,  5(/3',  updatem(/3,  a(/3),  o))  is  equal  to  1  if  o  is  in  Om(f3,  a(/3),/3'). 

The  above  equation  allows  us  to  conclude  the  following: 

Vm(P)  =  max  Vm (cr ,  (3) 

(7 

=  max  Rm(f3,  c((3))  +  7  Y  Nm((3,  cr((3))(o)  *  Vm(a,  updat em((3,a((3),o)) 

cr  L ' 

oeo 

=  max  Rm((3,cr ((3))  +7  Y  T((3,<j(/3))(f3')  *Vm(a,(3') 

p’erm(p,a(p)) 

max  r’bmdp(m)  (ai  (3) 

=  Ubmdp(m)  (P) 

Furthermore, 

Q*m(P,a)  =  Rm{/3,a )  +  7  Y^  Nm((3,  a)(o)  *  V£(updatem(/3,a(/3),o)) 

oeo 

=  Rm((3,  fl)  +  7  Y  r(/3>a)(/30  *<mdp(m)(/3/) 

P'eTm(J3,a) 

<ibmdp(m)  0^’ 

where  the  middle  lines  follow  from  the  same  reasoning  as  above. 
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Appendix  B 


Details  of  Empirical  Study 


B.l  Questionnaire 

Below  is  the  content  of  the  questionnaire.  The  formatting  differed  in  that  it  was  broken  up  into  multiple 
webpages.  Initial  instructions  were  shown  on  Mechanical  Turk’s  website  (Appendix  B.1.1).  The  additional 
instructions,  questions,  and  payment  information  were  shown  on  Survey  Gizmo’s  website  (Appendix  B.l. 2). 
Survey  Gizmo  always  showed  the  additional  instructions  first  and  the  payment  information  last.  For  each 
participant,  Survey  Gizmo  presented  the  scenarios  in  a  random  order  and  on  its  own  webpage.  Survey  Gizmo 
numbers  the  questions  dynamically  based  upon  the  order  in  which  Survey  Gizmo  presents  the  scenarios. 

Recall  that,  for  each  scenario,  the  first  two  questions  (Q1  and  Q2)  have  objectively  correct  answers  that 
the  survey  participant  may  easily  find  by  reading  the  scenario  and  we  use  them  to  determine  whether  the 
participant  put  thought  into  answering  the  questions. 

B.1.1  Mechanical  Turk 

If  you  choose  to  participate,  you  will  be  asked  a  series  questions  [sic]  about  when  an  action  is  for  a  purpose. 
If  you  fill  out  the  survey  reasonably  (do  not  just  randomly  select  answers),  you  will  be  paid  for  your  par¬ 
ticipation.  The  risks  of  taking  this  survey  are  equivalent  to  every  day  computer  use.  Your  participation  is 
voluntary. 

If  you  choose  to  participate,  then  fill  out  the  survey  at  SurveyGizmo  using  the  following  link: 
http : //edu . surveygizmo . com/s3/ 62114 6 /Hospital -Survey 
Upon  completion  enter  the  last  four  digits  of  your  phone  number  here: 


We  ask  for  this  number  so  we  can  track  who  successfully  completed  the  survey.  We  will  ask  you  to  enter 
the  same  number  at  SurveyGizmo. 

B.l. 2  Survey  Gizmo 

Instructions.  Metropolis  General  Hospital  has  the  following  privacy  policy: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 
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For  each  scenario  below,  please  answer  the  following  questions  based  on  your  understanding  of  the 
above  policy. 


Scenario  1.  Metropolis  General  Hospital  has  the  following  privacy  policy: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 

Please  answer  the  following  questions  based  on  your  understanding  of  the  above  policy  and  the  following 
scenario: 

A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker 
develops  a  plan  with  the  sole  goal  of  treating  the  patient.  The  plan  includes  sharing  the  patient’s 
medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  succeeds  in  treating 
the  patient. 

1.  Was  the  goal  of  the  case  worker’s  plan  to  treat  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

2.  Did  the  specialist  succeed  in  treating  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

3.  Did  the  case  worker  share  the  record  with  the  specialist  for  the  purpose  of  treatment? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

4.  Did  the  case  worker  obey  the  above  privacy  policy? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

5.  Why  did  you  answer  Question  4  as  you  did? 


120 


Scenario  2.  Metropolis  General  Hospital  has  the  following  privacy  policy: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 

Please  answer  the  following  questions  based  on  your  understanding  of  the  above  policy  and  the  following 
scenario: 

A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker 
develops  a  plan  with  the  sole  goal  of  treating  the  patient.  The  plan  includes  sharing  the  patient’s 
medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  did  not  succeed  in 
treating  the  patient. 

1.  Was  the  goal  of  the  case  worker’s  plan  to  treat  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

2.  Did  the  specialist  succeed  in  treating  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

3.  Did  the  case  worker  share  the  record  with  the  specialist  for  the  purpose  of  treatment? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

4.  Did  the  case  worker  obey  the  above  privacy  policy? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

5.  Why  did  you  answer  Question  4  as  you  did? 

I -  - 1 

Scenario  3.  Metropolis  General  Hospital  has  the  following  privacy  policy: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 
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Please  answer  the  following  questions  based  on  your  understanding  of  the  above  policy  and  the  following 
scenario: 

A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker 
develops  a  plan  with  the  sole  goal  of  reducing  costs  for  the  hospital.  The  plan  includes  sharing  the 
patient’s  medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  succeeds 
in  treating  the  patient. 

1.  Was  the  goal  of  the  case  worker’s  plan  to  treat  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

2.  Did  the  specialist  succeed  in  treating  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

3.  Did  the  case  worker  share  the  record  with  the  specialist  for  the  purpose  of  treatment? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

4.  Did  the  case  worker  obey  the  above  privacy  policy? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

5.  Why  did  you  answer  Question  4  as  you  did? 


Scenario  4.  Metropolis  General  Hospital  has  the  following  privacy  policy: 

Metropolis  General  Hospital  and  its  employees  will  share  a  patient’s  medical  record  with  an 
outside  specialist  only  for  the  purpose  of  providing  that  patient  with  treatment. 

Please  answer  the  following  questions  based  on  your  understanding  of  the  above  policy  and  the  following 
scenario: 

A  case  worker  employed  by  Metropolis  General  Hospital  meets  with  a  patient.  The  case  worker 
develops  a  plan  with  the  sole  goal  of  reducing  costs  for  the  hospital.  The  plan  includes  sharing  the 
patient’s  medical  record  with  an  outside  specialist.  Upon  receiving  the  record,  the  specialist  did  not 
succeed  in  treating  the  patient. 
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1.  Was  the  goal  of  the  case  worker’s  plan  to  treat  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

2.  Did  the  specialist  succeed  in  treating  the  patient? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

3.  Did  the  case  worker  share  the  record  with  the  specialist  for  the  purpose  of  treatment? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

4.  Did  the  case  worker  obey  the  above  privacy  policy? 

(a)  Yes 

(b)  No 

(c)  I  don’t  know 

5.  Why  did  you  answer  Question  4  as  you  did? 


Payment  Information.  To  receive  payment  on  Mechanical  Turk,  please  enter  the  last  four  digits  of  your 
phone  number  here: 

B.2  Mechanical  Turk  Advertisement 

Research  survey  on  the  meaning  of  privacy 

Requester:  Michael  Carl  Tschantz  HIT  Expiration  Date:  Jul  20,  2011  (2  weeks  5  days)  Reward:  $0.50 

Time  Allotted:  10  minutes  HITs  Available:  200 

Description:  Take  a  short  survey  about  how  you  interpret  a  privacy  policy  to  help 
research  on  the  topic  taking  place  at  Carnegie  Mellon  University. 

Keywords:  Survey,  Research 

Qualifications  Required: 

HIT  approval  rate  (%)  is  greater  than  95 
Location  is  US 
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B.3  Tables  of  Matched  Pairs 


Question  Q3 

Yes 

Spt 

I  don’t  know 

No 

Yes 

25 

2 

16 

Spf  I  don’t  know 

0 

4 

2 

No 

13 

4 

121 

Question  Q3 

Yes 

Spf 

I  don't  know 

No 

Yes 

182 

1 

2 

Spf  I  don’t  know 

1 

0 

1 

No 

0 

0 

0 

Question  Q4 

Yes 

Spf 

I  don’t  know 

No 

Yes 

176 

0 

6 

Spf  I  don’t  know 

0 

2 

0 

No 

1 

0 

2 

Question  Q4 

Yes 

Spf 

I  don’t  know 

No 

Yes 

26 

3 

16 

Spf  I  don’t  know 

1 

5 

3 

No 

4 

1 

128 

Question  Q3 

Yes 

Spf 

I  don’t  know 

No 

Yes 

37 

9 

137 

Spf  I  don’t  know 

0 

0 

1 

No 

1 

1 

1 

Question  Q3 

Yes 

Spf 

I  don't  know 

No 

Yes 

43 

6 

136 

Spf  I  don’t  know 

0 

0 

2 

No 

0 

0 

0 

Question  Q4 

Yes 

Spf 

I  don’t  know 

No 

Yes 

30 

8 

139 

Spf  I  don’t  know 

0 

1 

1 

No 

1 

0 

7 

Question  Q4 

Yes 

Spf 

I  don't  know 

No 

Yes 

45 

8 

129 

Spf  I  don’t  know 

0 

1 

1 

No 

0 

0 

3 

Scenario  Spf 

Yes 

Q3 

I  don't  know 

No 

Yes 

176 

1 

0 

q4  I  don’t  know 

2 

0 

0 

No 

5 

0 

3 

Scenario  Spf 

Yes 

Q3 

I  don’t  know 

No 

Yes 

181 

1 

0 

q4  I  don't  know 

1 

1 

0 

No 

3 

0 

0 

Scenario  S^ 

Yes 

Q3 

I  don’t  know 

No 

Yes 

22 

0 

9 

q4  I  don’t  know 

2 

5 

2 

No 

14 

5 

128 

Scenario  Spf 

Yes 

Q3 

I  don’t  know 

No 

Yes 

32 

2 

11 

q4  I  don't  know 

1 

3 

5 

No 

10 

1 

122 

124 


B.4 


Results  Using  All  Respondents 


Scenario 

Yes 

I  don't  know 

No 

spf 

205  (99%) 

1  (00%) 

1  (00%) 

Spf 

202  (98%) 

2  (01%) 

3  (01%) 

Spf 

25  (12%) 

5  (02%) 

177  (86%) 

Spf _ 

18  (09%) 

2  (01%) 

187  (90%) 

Ql:  Was  the  goal  treatment?  (question  with  an  objectively  correct  answer) 


Scenario 

Yes 

I  don’t  know 

No 

Spf 

206(100%) 

1  (00%) 

0  (00%) 

Spf 

3  (01%) 

1  (00%) 

203  (98%) 

Spf 

196  (95%) 

3  (01%) 

8  (04%) 

Sm _ 

5  (02%) 

0  (00%) 

202  (98%) 

Q2:  Was  the  treatment  successful?  (question  with  an  objectively  correct  answer) 


Scenario 

Yes 

I  don't  know 

No 

Spf 

205  (99%) 

2(01%) 

0  (00%) 

Spf 

202  (98%) 

2(01%) 

3  (01%) 

Spf 

59  (29%) 

9  (04%) 

139  (67%) 

Sm _ 

51  (25%) 

14  (07%) 

142  (69%) 

Q3:  Was  the  action  for  the  purpose? 


Scenario 

Yes 

I  don't  know 

No 

Spf 

201  (97%) 

2(01%) 

4  (02%) 

Spf 

195  (94%) 

3  (01%) 

9  (04%) 

Spf 

61  (29%) 

11  (05%) 

135  (65%) 

Spf 

44  (21%) 

12  (06%) 

151  (73%) 

Q4:  Was  the  policy  obeyed? 


Table  B.l:  Survey  Results  for  All  Respondents 
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Testing  Alternative  Hypothesis  Null  Hypothesis 

p- Value  Signibcant? 

Against  Hla  ppfy  <  0.5  Ppfy  =  0.5 

Against  Hla  ppfn  >  0.5  ppfn  =  0.5 

Against  Hla  Ppfy  <  0.5  Ppfy  =  0.5 

Against  Hla  pp fn  >  0.5  pp fn  =  0.5 

1  No 

1  No 

1.59774e-009  Yes 

7.097797e-006  Yes 

Against  Hla’  p'pfy  <  0.5  Ppfy  =  0.5 

Against  Hla’  p'pfn  >  0.5  Ppfn  =  0.5 

Against  Hla’  p'pfy  <0.5  p'pfy  =  0.5 

Against  Hla’  p'pfn  >  0.5  p'pfn  =  0.5 

1  No 

1  No 

2.606856e-010  Yes 

4.514694e-007  Yes 

Against  Hlb  pp fn  <  0.5  pp fn  =  0.5 

Against  Hlb  Ppfy  >  0.5  Ppfy  =  0.5 

Against  Hlb  ppln  <  0.5  pp ~fn  =  0.5 

Against  Hlb  Ppfv  >  0.5  Ppfy  =  0.5 

8.206618e-048  Yes 

4.833563e-044  Yes 

1  No 

1  No 

Against  Hlb’  P'pfn  <  P'pfn  =  ^ 

Against  Hlb’  p'^  >  0.5  =  0.5 

Against  Hlb’  p'.^n  <  0.5  p'^n  =  0.5 

Against  Hlb’  >  0.5  p'-^  =  0.5 

7.187894e-057  Yes 

1.503496e-053  Yes 

1  No 

1  No 

For  H2a  ppfy  >  0.5  ppfy  =  0.5 

For  H2a  ppfn  <  0.5  ppfn  =  0.5 

For  H2a  ppfy  >  0.5  ppfy  =  0.5 

For  H2a  ppfn  <  0.5  ppfn  =  0.5 

5.08808e-052  Yes 

3.684324e-055  Yes 

4.833563e-044  Yes 

8.206618e-048  Yes 

For  H2a’  p'pfy  >  0.5  p'pfy  =  0.5 

For  H2a’  p'pfn  <  0.5  p'pfn  =  0.5 

For  H2a’  p' >0.5  p' -c  =0.5 

pfy  pfy 

For  H2a’  <0.5  p' -c  =0.5 

r  pt  n  r  pt  n 

1.046682e-058  Yes 

4.861731e-063  Yes 

1.503496e-053  Yes 

7.187894e-057  Yes 

For  H2b  ppfn  >  0.5  ppfn  =  0.5 

For  H2b  ppfy  <  0.5  ppfy  =  0.5 

For  H2b  Pp fn  >  0.5  ppfn  =  0.5 

For  H2b  pp-fy  <  0.5  ppFy  =  0.5 

7.097797e-006  Yes 

1.59774e-009  Yes 

1.443359e-01 1  Yes 

1.440142e-017  Yes 

For  H2b’  p'pfn  >  0.5  p'pfn  =  0.5 

For  H2b’  p'pfy  <  0.5  p'pfy  =  0.5 

For  H2b’  p'.-f  >  0.5  r/,  =  0.5 

For  H2b’  <0.5  p'.-(  =0.5 

_ ipfy _ LeIy _ 

4.514694e-007  Yes 

2.606856e-010  Yes 

4.581869e-008  Yes 

7.161858e-014  Yes 

Table  B.2:  Binomial  Hypothesis  Tests  for  All  Respondents 
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Testing 

Alternative  Hypothesis 

Null  Hypothesis 

Proving  H2a 

Ppfy  A 

0.94 

Ppfy  — 

0.94 

Proving  H2a 

Ppfn  ^ 

0.05 

Ppfn 

0.05 

Proving  H2a 

Ppfy  > 

0.9 

PPfy  = 

0.9 

Proving  H2a 

Ppfn  < 

0.08 

Ppfn  = 

0.08 

Proving  H2a’ 

Ppfy  > 

0.96 

Ppfy  — 

0.96 

Proving  H2a’ 

Ppfn  < 

0.02 

Ppfn  = 

0.02 

Proving  H2a’ 

A 

a. 

0.94 

Ppfy  — 

0.94 

Proving  H2a’ 

P'  r  < 

^pfn 

0.04 

Ppfn  = 

0.04 

Proving  H2b 

Ppfn 

0.59 

Ppfn  = 

0.59 

Proving  H2b 

Ppfy  A 

0.36 

Ppfy  = 

0.36 

Proving  H2b 

Ppfn  ^ 

0.67 

Ppfn 

0.67 

Proving  H2b 

Ppfy  A 

0.27 

Ppfy  — 

0.27 

Proving  H2b’ 

Ppfn  > 

0.61 

Ppfn  = 

0.61 

Proving  H2b’ 

P'pfy  < 

0.35 

P'pfy  = 

0.35 

Proving  H2b’ 

Ppfn  > 

0.62 

P-r  = 
^pfn 

0.62 

Proving  H2b’ 

P-f  < 

_ IPfy _ 

0.31 

_ ^pfy  ~ 

0.31 

Table  B.3:  Extreme  Binomial  Hypothesis  Tests  for  All  Respondents.  This  table  shows  the  hypothesis  test  using  the 
most  extreme  probability  for  which  statistical  significance  is  still  achieved  and  is  accurate  up  to  two  places  after  the 
decimal  point. 
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Question  Q3 

Yes 

Spf 

I  don't  know 

No 

Yes 

201 

2 

2 

Spf  I  don’t  know 

1 

0 

1 

No 

0 

0 

0 

Question  Q4 

Yes 

Spf 

I  don't  know 

No 

Yes 

194 

1 

6 

Spf  I  don’t  know 

0 

2 

0 

No 

1 

0 

3 

Question  Q3 

Yes 

Spf 

I  don’t  know 

No 

Yes 

59 

9 

137 

Spf  I  don’t  know 

0 

0 

2 

No 

0 

0 

0 

Question  Q4 

Yes 

Spf 

I  don’t  know 

No 

Yes 

61 

10 

130 

Spf  I  don’t  know 

0 

1 

1 

No 

0 

0 

4 

Scenario  Spf 

Yes 

Q3 

I  don’t  know 

No 

Yes 

200 

1 

0 

q4  I  don't  know 

1 

1 

0 

No 

4 

0 

0 

Scenario  Spf 

Yes 

Q3 

I  don’t  know 

No 

Yes 

47 

3 

11 

q4  I  don’t  know 

1 

5 

5 

No 

11 

1 

123 

Question  Q3 

Yes 

Spf 

I  don’t  know 

No 

Yes 

38 

3 

18 

Spf  I  don’t  know 

0 

7 

2 

No 

13 

4 

122 

Question  Q4 

Yes 

Spf 

I  don’t  know 

No 

Yes 

39 

4 

18 

Spf  I  don’t  know 

1 

7 

3 

No 

4 

1 

130 

Question  Q3 

Yes 

Spf 

I  don’t  know 

No 

Yes 

50 

12 

140 

Spf  I  don’t  know 

0 

1 

1 

No 

1 

1 

1 

Question  Q4 

Spf 

Yes  I  don’t  know  No 

Yes 

Spf  I  don’t  know 
No 

43  10  142 

0  2  1 

1  0  8 

Scenario  S  f 

Yes 

Q3 

I  don’t  know 

No 

Yes 

194 

1 

0 

q4  I  don’t  know 

2 

1 

0 

No 

6 

0 

3 

Scenario  S^ 

Yes 

Q3 

I  don’t  know 

No 

Yes 

34 

1 

9 

q4  I  don’t  know 

2 

8 

2 

No 

15 

5 

131 

Table  B.4:  Matched  Pairs  for  All  Respondents 
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Testing 

Question 

Scenarios 

p- Value 

Significant? 

For  Hlc 

Q4 

SPf  vs.  Spf 

NaN 

No 

For  Hlc 

Q4 

Spf  vs.  Spf 

0.008449127 

Yes 

For  Hlc’ 

Q3 

Spf  vs.  Spf 

0.3430301 

No 

For  Hlc’ 

Q3 

Spf  vs.  S^ 

0.2147006 

No 

For  H2c 

Q4 

Spf  vs.  Spf 

2.300576e-030 

Yes 

For  H2c 

Q4 

Spf  vs.  Spf 

2.598558e-032 

Yes 

For  H2c’ 

Q3 

Spf  vs.  Spf 

7.115157e-032 

Yes 

For  H2c’ 

Q3 

Spf  vs.  S^ 

4.269341e-032 

Yes 

Table  B.5:  McNemar’s  Test  Across  Scenarios  for  All  Respondents 


Scenario 

Questions 

p- Value 

Significant? 

Spf 

Q4  vs.  Q3 

NaN 

No 

Spf 

Q4  vs.  Q3 

NaN 

No 

Spf 

Q4  vs.  Q3 

0.2997806 

No 

Spf 

Q4  vs.  Q3 

0.3736321 

No 

Table  B.6:  McNemar’s  Test  Across  Questions  for  All  Respondents 
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Appendix  C 

Notation 


a  an  action 

A  a  random  variable  over  actions 

A  the  action  space  of  an  MDP  or  POMDP 

active  the  prefix  of  a  given  execution  before  the  first  instance  of  the  action  stop 
ad  an  advertising  action  in  the  advertising  example 
b  a  behavior 

B  a  random  variable  over  belief  states 

B  space  of  all  possible  belief  states 

behv  the  set  of  optimal  behaviors  for  a  given  MDP  or  POMDP 
B  the  binomial  distribution  for  the  provided  parameters 
bmdp  the  belief  MDP  of  a  given  POMDP 

d  ranges  over  {f,  m,  _L},  the  possible  values  of  the  database  in  the  advertising  example 
degen  a  degenerate  distribution:  for  any  x,  degen  (x)  assigns  probability  1  to  x 
Dist  the  space  of  distributions  over  a  given  set 
e  an  execution  of  an  MDP  or  POMDP 
E  expectation 

f  stands  for  female  in  the  advertising  example 

F  cumulative  distribution  function 

g  ranges  over  {f,  m},  the  possible  sexes  of  a  website  user  in  the  advertising  example 
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h  the  number  of  steps  modeled  for  multi-step  RHIO  application 
H  a  hypothesis  about  the  survey 
i  an  index 

j  an  index 

£  a  log 

L  set  of  all  possible  logs 
log  function  from  a  behavior  to  a  log 
log-1  the  inverse  of  log 

m  an  MDP  or  POMDP 

maciu  a  POMDP  modeling  the  advertising  example 
mphy  a  POMDP  modeling  the  physician  example 
m  stands  for  male  in  the  advertising  example 
n  the  length  of  a  sequence  or  the  sample  size  of  the  survey 
N  v  lifted  for  POMDPs 

N  the  set  of  natural  numbers:  {1,  2, 3, . . .} 

nbehv  the  set  of  non-redundant  optimal  behaviors  of  a  given  MDP  or  POMDP 
nopt  set  of  non-redundant  optimal  strategies  of  a  given  MDP  or  POMDP 
o  an  observation  of  POMDP 

O  a  random  variable  over  observations  of  POMDP  (often  denoted  by  Q)  of  an  equivalence  class  of 

O  space  of  observations  of  a  POMDP 

opt  set  of  optimal  strategies  of  a  given  MDP  or  POMDP 

p  a  purpose  or  a  probability  in  Ch  5 

Pr  the  probability  of  an  expression 

q  value  function  of  a  state  and  an  action  of  an  MDP  given  a  strategy  (typically  denoted  by  Q) 

q*  optimal  value  function  of  a  state  and  an  action  of  an  MDP  (typically  denoted  by  Q*) 

q*  a  sub-routine  for  computing  q*  from  v* 
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Q  value  function  of  a  state  and  an  action  of  a  POMDP  given  a  strategy 
Q*  optimal  value  function  of  a  state  and  an  action  of  a  POMDP 
Q*  a  sub-routine  for  computing  Q*  from  V* 

Q  a  question  of  the  survey 
r  reward  function  for  an  MDP  or  POMDP 
R  reward  function  r  raised  for  POMDPs 

M  set  of  real  numbers 

s  a  state 

S  a  random  variable  over  states 

S  state  space 

S  a  scenario  in  the  survey 

stop  the  distinguished  action  of  NMDP  or  NPOMDP  that  indicates  stopping  and  doing  nothing  more 
t  transition  relation  of  an  MDP  or  POMDP 

update  the  function  that  updates  an  agents  beliefs  given  an  action  and  an  observation  of  a  POMDP 
v  value  function  of  a  state  of  an  MDP  given  a  strategy  (typically  denoted  by  V) 

v*  optimal  value  function  of  a  state  of  an  MDP  (typically  denoted  by  V*) 

v*ow  lower  bounds  on  v* 

v*  upper  bounds  on  v* 

V  value  function  of  a  state  of  a  POMDP  given  a  strategy 
V*  optimal  value  function  of  a  state  of  a  POMDP 

V*ow  lower  bounds  on  V* 

V*  upper  bounds  on  V* 

x  ranges  over  X-rays  in  the  physician  example  or  over  the  elements  of  any  set 

y  the  number  of  yes  responses  to  the  survey  or  ranges  over  the  elements  of  any  set 

Y  random  variable  over  number  of  yes  responses  to  the  survey 

a  ranges  over  what  advertisement  (or  none)  the  website  could  have  shown  in  the  advertising  example; 
the  level  used  for  significance  for  hypothesis  testing  the  survey 
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j3  a  belief  state  of  a  belief  MDP 

7  discounting  factor  of  an  MDP  or  POMDP 
T  set  of  belief  states  with  non-zero  probability 

5  Kronecker’s  delta 

Sr  the  increase  in  quality  of  treatment  from  reading  a  patient’s  record  in  the  RHIO  application 

6s  the  increase  in  quality  of  treatment  from  studying  medical  literature  in  the  RHIO  application 

k  a  contingency 

v  probability  of  an  observation  given  a  state  and  an  action  for  a  POMDP  (often  denoted  by  O ) 
p  a  reward 

pl  the  reward  for  treating  a  patient  in  the  RHIO  application 
a  a  strategy  (typically  called  “policy”  and  denoted  by  7r) 
t  transition  relation  of  a  belief  MDP 

0  set  of  observations  that  could  have  resulted  in  a  given  updated  belief  for  a  POMDP 

0'  set  of  observations  possible  for  a  given  belief  state  and  action  for  a  POMDP 

_L  stands  for  no  information  known  about  the  user’s  sex  in  the  advertising  example 
o  a  dummy  observation  providing  no  information 

0  stands  for  the  website  having  not  shown  any  advertisement  to  the  user  in  the  advertising  example 
x  cross  product 

=  an  equivalence  relation  over  observations 

=adv  an  equivalence  relation  used  in  the  advertising  example 

=phy  an  equivalence  relation  used  in  the  physician  example 

IZ  prefix  relation  over  sequences 

C  prefix-or-equal  relation  over  sequences 

-<  sub-strategy  relation 

<1  sub-execution  relation 

~  denotes  that  a  random  variable  obeys  a  distribution 
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